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NEISSERIAL ANTIGENS 



PCIYIB98/01665 



This invention relates to antigens from Neisseria bacteria. 
BACKGROUND ART 

Neisseria meningitidis and Neisseria gonorrhoeae are non-motile, gram negative diplococci that 
5 are pathogenic in humans. N. meningitidis colonises the pharynx and causes meningitis (and, 
occasionally, septicaemia in the absence of meningitis); N. gonorrhoeae colonises the genital tract 
and causes gonorrhea. Although colonising different areas of the body and causing completely 
different diseases, the two pathogens are closely related, although one feature that clearly 
differentiates meningococcus from gonococcus is the presence of a polysaccharide capsule that is 
1 0 present in all pathogenic meningococci. 

N. gonorrhoeae caused approximately 800,000 cases per year during the period 1983-1990 in the 
United States alone (chapter by Meitzner & Cohen, "Vaccines Against Gonococcal Infection", In: 
New Generation Vaccines, 2nd edition, ed. Levine, Woodrow, Kaper, & Cobon, Marcel Dekker, 
New York, 1997, pp.8 17-842). The disease causes significant morbidity but limited mortality. 
15 Vaccination against N gonorrhoeae would be highly desirable, but repeated attempts have failed. 
The main candidate antigens for this vaccine are surface-exposed proteins such as pili, porins, 
opacity-associated proteins (Opas) and other surface-exposed proteins such as the Lip, Laz, IgAl 
protease and transferrin-binding proteins. The lipooligosaccharide (LOS) has also been suggested 
as vaccine (Meitzner & Cohen, supra). 

20 N. meningitidis causes both endemic and epidemic disease. In the United States the attack rate is 
0.6-1 per 100,000 persons per year, and it can be much greater during outbreaks (see Lieberman 
et al (1996) Safety and Immunogenicity of a Serogroups A/C Neisseria meningitidis 
Oligosaccharide-Protein Conjugate Vaccine in Young Children. JAMA 275(19):1499-1503; 
Schuchat et al (1997) Bacterial Meningitis in the United States in 1995. NEngl J Med 337(14):970- 

25 976). In developing countries, endemic disease rates are much higher and during epidemics 
incidence rates can reach 500 cases per 100,000 persons per year. Mortality is extremely high, at 
10-20% in the United States, and much higher in developing countries. Following the introduction 
of the conjugate vaccine against Haemophilus influenzae, N. meningitidis is the major cause of 
bacterial meningitis at all ages in the United States (Schuchat et al (1997) supra). 
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Based on the organism's capsular polysaccharide, 12 serogroups of N. meningitidis have been 
identified. Group A is the pathogen most often implicated in epidemic disease in sub-Saharan 
Africa. Serogroups B and C are responsible for the vast majority of cases in the United States and 
in most developed countries. Serogroups W 135 and Y are responsible for the rest of the cases in 
5 the United States and developed countries. The meningococcal vaccine currently in use is a 
tetravalent polysaccharide vaccine composed of serogroups A, C, Y and W135. Although 
efficacious in adolescents and adults, it induces a poor immune response and short duration of 
protection, and cannot be used in infants [eg. Morbidity and Mortality weekly report, Vol.46, No. 
RR-5 (1997)]. This is because polysaccharides are T-cell independent antigens that induce a weak 

10 immune response that cannot be boosted by repeated immunization. Following the success of the 
vaccination against H.influenzae, conjugate vaccines against serogroups A and C have been 
developed and are at the final stage of clinical testing (Zollinger WD "New and Improved Vaccines 
Against Meningococcal Disease" in: New Generation Vaccines, supra, pp. 469-488; Lieberman et 
al (1996) supra; Costantino et al (1992) Development and phase I clinical testing of a conjugate 

15 vaccine against meningococcus A and C. Vaccine 10:691-698). 

Meningococcus B remains a problem, however. This serotype currently is responsible for 
approximately 50% of total meningitis in the United States, Europe, and South America. The 
polysaccharide approach cannot be used because the menB capsular polysaccharide is a polymer 
of a(2-8)-linked N-acetyl neuraminic acid that is also present in mammalian tissue. This results in 

20 tolerance to the antigen; indeed, if an immune response were elicited, it would be anti-self, and 
therefore undesirable. In order to avoid induction of autoimmunity and to induce a protective 
immune response, the capsular polysaccharide has, for instance, been chemically modified 
substituting the Af-acetyl groups with N-propionyl groups, leaving the specific antigenicity 
unaltered (Romero & Outschoorn (1994) Current status of Meningococcal group B vaccine 

25 candidates: capsular or non-capsular? Clin Microbiol Rev 7(4):559-575). 

Alternative approaches to menB vaccines have used complex mixtures of outer membrane proteins 
(OMPs), containing either the OMPs alone, or OMPs enriched in porins, or deleted of the class 4 
OMPs that are believed to induce antibodies that block bactericidal activity. This approach 
produces vaccines that are not well characterized. They are able to protect against the homologous 
30 strain, but are not effective at large where there are many antigenic variants of the outer membrane 
proteins. To overcome the antigenic variability, multivalent vaccines containing up to nine different 
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porins have been constructed (eg. Poolman JT (1992) Development of a meningococcal vaccine. 
Infect Agents Dis. 4:13-28). Additional proteins to be used in outer membrane vaccines have been 
the opa and opc proteins, but none of these approaches have been able to overcome the antigenic 
variability (eg. Ala'Aldeen & Borriello (1996) The meningococcal transfenin-binding proteins 1 
5 and 2 are both surface exposed and generate bactericidal antibodies capable of killing homologous 
and heterologous strains. Vaccine 14(l):49-53). 

A certain amount of sequence data is available for meningococcal and gonoccocal genes and 
proteins (eg. EP-A-0467714, W096/29412), but this is by no means complete. The provision of 
further sequences could provide an opportunity to identify secreted or surface-exposed proteins that 
10 are presumed targets for the immune system and which are not antigenically variable. For instance, 
some of the identified proteins could be components of efficacious vaccines against meningococcus 
B, some could be components of vaccines against all meningococcal serotypes, and others could 
be components of vaccines against all pathogenic Neisserias 

THE INVENTION 

15 The invention provides proteins comprising the Neisserial amino acid sequences disclosed in the 
examples. These sequences relate to N. meningitidis ox N. gonorrhoeae. 

It also provides proteins comprising sequences homologous (ie. having sequence identity) to the 
Neisserial amino acid sequences disclosed in the examples. Depending on the particular sequence, 
the degree of identity is preferably greater than 50% (eg. 65%, 80%, 90%, or more). These 
20 homologous proteins include mutants and allelic variants of the sequences disclosed in the 
examples. Typically, 50% identity or more between two proteins is considered to be an indication of 
functional equivalence. Identity between the proteins is preferably determined by the Smith-Waterman 
homology search algorithm as implemented in the MPSRCH program (Oxford Molecular), using an 
affine gap search with parameters gap open penalty^! 2 and gap extension penalty=L 

25 The invention further provides proteins comprising fragments of the Neisserial amino acid 
sequences disclosed in the examples. The fragments should comprise at least n consecutive amino 
acids from the sequences and, depending on the particular sequence, n is 7 or more (eg. 8, 10, 12, 
14, 16, 18, 20 or more). Preferably the fragments comprise an epitope from the sequence. 
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The proteins of the invention can, of course, be prepared by various means (eg. recombinant 
expression, purification from cell culture, chemical synthesis etc.) and in various forms (eg. native, 
fusions etc.). They are preferably prepared in substantially pure or isolated form (ie. substantially 
free from other Neisserial or host cell proteins) 

5 According to a further aspect, the invention provides antibodies which bind to these proteins. These 
may be polyclonal or monoclonal and may be produced by any suitable means. 

According to a further aspect, the invention provides nucleic acid comprising the Neisserial 
nucleotide sequences disclosed in the examples. In addition, the invention provides nucleic acid 
comprising sequences homologous (ie. having sequence identity) to the Neisserial nucleotide 
10 sequences disclosed in the examples. 

Furthermore, the invention provides nucleic acid which can hybridise to the Neisserial nucleic acid 
disclosed in the examples, preferably under "high stringency" conditions (eg. 65°C in a O.lxSSC, 
0.5% SDS solution). 

Nucleic acid comprising fragments of these sequences are also provided. These should comprise 
15 at least n consecutive nucleotides from the Neisserial sequences and, depending on the particular 
sequence, n is 10 or more (eg 12, 14, 15, 18, 20, 25, 30, 35, 40 or more). 

According to a further aspect, the invention provides nucleic acid encoding the proteins and protein 
fragments of the invention. 

It should also be appreciated that the invention provides nucleic acid comprising sequences 
20 complementary to those described above (eg. for antisense or probing purposes). 

Nucleic acid according to the invention can, of course, be prepared in many ways (eg. by chemical 
synthesis, from genomic or cDNA libraries, from the organism itself etc.) and can take various 
forms (eg. single stranded, double stranded, vectors, probes etc.). 

In addition, the term "nucleic acid" includes DNA and RNA, and also their analogues, such as 
25 those containing modified backbones, and also peptide nucleic acids (PNA) etc. 
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According to a further aspect, the invention provides vectors comprising nucleotide sequences of 
the invention (eg. expression vectors) and host cells transformed with such vectors. 

According to a further aspect, the invention provides compositions comprising protein, antibody, 
and/or nucleic acid according to the invention. These compositions may be suitable as vaccines, 
5 for instance, or as diagnostic reagents, or as immunogenic compositions. 

The invention also provides nucleic acid, protein, or antibody according to the invention for use 
as medicaments (eg. as vaccines) or as diagnostic reagents. It also provides the use of nucleic acid, 
protein, or antibody according to the invention in the manufacture of: (i) a medicament for treating 
or preventing infection due to Neisserial bacteria; (ii) a diagnostic reagent for detecting the 
10 presence of Neisserial bacteria or of antibodies raised against Neisserial bacteria; and/or (iii) a 
reagent which can raise antibodies against Neisserial bacteria. Said Neisserial bacteria may be any 
species or strain (such as N. gonorrhoeae, or any strain of W. meningitidis, such as strain A, strain 
B or strain C). 

The invention also provides a method of treating a patient, comprising administering to the patient 
15 a therapeutically effective amount of nucleic acid, protein, and/or antibody according to the 
invention. 

According to further aspects, the invention provides various processes. 

A process for producing proteins of the invention is provided, comprising the step of culturing a 
host cell according to the invention under conditions which induce protein expression. 

20 A process for producing protein or nucleic acid of the invention is provided, wherein the the protein 
or nucleic acid is synthesised in part or in whole using chemical means. 

A process for detecting polynucleotides of the invention is provided, comprising the steps of: (a) 
contacting a nucleic probe according to the invention with a biological sample under hybridizing 
conditions to form duplexes; and (b) detecting said duplexes. 

25 A process for detecting proteins of the invention is provided, comprising the steps of: (a) contacting 
an antibody according to the invention with a biological sample under conditions suitable for the 
formation of an antibody-antigen complexes; and (b) detecting said complexes. 
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A summary of standard techniques and procedures which may be employed in order to perform the 
invention (eg. to utilise the disclosed sequences for vaccination or diagnostic purposes) follows. 
This summary is not a limitation on the invention but, rather, gives examples that may be used, but 
are not required. 

5 General 

The practice of the present invention will employ, unless otherwise indicated, conventional 
techniques of molecular biology, microbiology, recombinant DNA, and immunology, which are 
within the skill of the art. Such techniques are explained fully in the literature eg. Sambrook 
Molecular Cloning; A Laboratory Manual, Second Edition (1989); DNA Cloning, Volumes I and 

10 ii (D.N Glover ed. 1985); Oligonucleotide Synthesis (M.J. Gait ed, 1984); Nucleic Acid 
Hybridization (B.D. Hames & SJ. Higgins eds. 1984); Transcription and Translation (B.D. Hames 
& S.J. Higgins eds. 1984); Animal Cell Culture (R.I. Freshney ed. 1986); Immobilized Cells and 
Enzymes (IRL Press, 1986); B. Perbal, A Practical Guide to Molecular Cloning (1984); the 
Methods in Enzymology series (Academic Press, Inc.), especially volumes 154 & 155; Gene 

1 5 Transfer Vectors for Mammalian Cells (J.H. Miller and M.P. Calos eds. 1987, Cold Spring Harbor 
Laboratory); Mayer and Walker, eds. (1987), Immunochemical Methods in Cell and Molecular 
Biology (Academic Press, London); Scopes, (1987) Protein Purification: Principles and Practice, 
Second Edition (Springer-Verlag, N.Y.), and Handbook of Experimental Immunology, Volumes 
I-IV (D.M. Weir and C. C. Blackwell eds 1986). 

20 Standard abbreviations for nucleotides and amino acids are used in this specification. 

All publications, patents, and patent applications cited herein are incorporated in full by reference. 
In particular, the contents of UK patent applications 9723516.2, 9724190.5, 9724386.9, 9725158.1, 
9726147.3, 9800759.4, and 9819016.8 are incorporated herein. 

Definitions 

25 A composition containing X is "substantially free of Y when at least 85% by weight of the total 
X+Y in the composition is X. Preferably, X comprises at least about 90% by weight of the total of 
X+Y in the composition, more preferably at least about 95% or even 99% by weight. 

The term "comprising" means "including" as well as "consisting" eg. a composition "comprising" 
X may consist exclusively of X or may include something additional to X, such as X+Y. 
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The term "heterologous" refers to two biological components that are not found together in nature. 
The components may be host cells, genes, or regulatory regions, such as promoters. Although the 
heterologous components are not found together in nature, they can function together, as when a 
promoter heterologous to a gene is operably linked to the gene. Another example is where a 
5 Neisserial sequence is heterologous to a mouse host cell. A further examples would be two epitopes 
from the same or different proteins which have been assembled in a single protein in an 
arrangement not found in nature. 

An "origin of replication" is a polynucleotide sequence that initiates and regulates replication of 
polynucleotides, such as an expression vector. The origin of replication behaves as an autonomous 

10 unit of polynucleotide replication within a cell, capable of replication under its own control. An 
origin of replication may be needed for a vector to replicate in a particular host cell. With certain 
origins of replication, an expression vector can be reproduced at a high copy number in the 
presence of the appropriate proteins within the cell. Examples of origins are the autonomously 
replicating sequences, which are effective in yeast; and the viral T-antigen, effective in COS-7 

15 cells. 

A "mutant" sequence is defined as DNA, RNA or amino acid sequence differing from but having 
sequence identity with the native or disclosed sequence. Depending on the particular sequence, the 
degree of sequence identity between the native or disclosed sequence and the mutant sequence is 
preferably greater than 50% (eg. 60%, 70%, 80%, 90%, 95%, 99% or more, calculated using the 

20 Smith- Waterman algorithm as described above). As used herein, an "allelic variant" of a nucleic 
acid molecule, or region, for which nucleic acid sequence is provided herein is a nucleic acid 
molecule, or region, that occurs essentially at the same locus in the genome of another or second 
isolate, and that, due to natural variation caused by, for example, mutation or recombination, has 
a similar but not identical nucleic acid sequence. A coding region allelic variant typically encodes 

25 a protein having similar activity to that of the protein encoded by the gene to which it is being 
compared. An allelic variant can also comprise an alteration in the 5* or 3' untranslated regions of 
the gene, such as- in regulatory control regions (eg. see US patent 5,753,235). 

Expression systems 

The Neisserial nucleotide sequences can be expressed in a variety of different expression systems; 
30 for example those used with mammalian cells, baculoviruses, plants, bacteria, and yeast. 
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i. Mammalian Systems 

Mammalian expression systems are known in the art. A mammalian promoter is any DNA 
sequence capable of binding mammalian RNA polymerase and initiating the downstream (3 1 ) 
transcription of a coding sequence (eg, structural gene) into mRNA. A promoter will have a 

5 transcription initiating region, which is usually placed proximal to the 5 1 end of the coding 
sequence, and a TATA box, usually located 25-30 base pairs (bp) upstream of the transcription 
initiation site. The TATA box is thought to direct RNA polymerase II to begin RNA synthesis at 
the correct site. A mammalian promoter will also contain an upstream promoter element, usually 
located within 100 to 200 bp upstream of the TATA box. An upstream promoter element 

10 determines the rate at which transcription is initiated and can act in either orientation [Sambrook 
et al. (1989) "Expression of Cloned Genes in Mammalian Cells." In Molecular Cloning: A 
Laboratory Manual 2nded.]. 

Mammalian viral genes are often highly expressed and have a broad host range; therefore sequences 
encoding mammalian viral genes provide particularly useful promoter sequences. Examples include 
15 the SV40 early promoter, mouse mammary tumor virus LTR promoter, adenovirus major late 
promoter (Ad MLP), and herpes simplex virus promoter. In addition, sequences derived from non- 
viral genes, such as the murine metallotheionein gene, also provide useful promoter sequences. 
Expression may be either constitutive or regulated (inducible), depending on the promoter can be 
induced with glucocorticoid in hormone-responsive cells. 

20 The presence of an enhancer element (enhancer), combined with the promoter elements described 
above, will usually increase expression levels. An enhancer is a regulatory DNA sequence that can 
stimulate transcription up to 1000-fold when linked to homologous or heterologous promoters, with 
synthesis beginning at the normal RNA start site. Enhancers are also active when they are placed 
upstream or downstream from the transcription initiation site, in either normal or flipped orien- 

25 tation, or at a distance of more than 1000 nucleotides from the promoter [Maniatis et al. (1987) 
Science 255:1237; Alberts et al. (1989) Molecular Biology of the Cell, 2nd edj. Enhancer elements 
derived from viruses may be particularly useful, because they usually have a broader host range. 
Examples include the SV40 early gene enhancer [Dijkema et al (1985) EMBO 1 4:761] and the 
enhancer/promoters derived from the long terminal repeat (LTR) of the Rous Sarcoma Virus 

30 [Gorman et al. (1982b) Proc. Natl Acad. Set 79:6717] and from human cytomegalovirus [Boshart 
et al. (1985) Cell 47:521]. Additionally, some enhancers are regulatable and become active only 
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in the presence of an inducer, such as a hormone or metal ion [Sassone-Corsi and Borelli (1986) 
Trends Genet 2:215; Maniatis et al. (1987) Science 236:1237]. 

A DNA molecule may be expressed intracellularly in mammalian cells. A promoter sequence may be 
directly linked with the DNA molecule, in which case the first amino acid at the N-terminus of the 
5 recombinant protein will always be a methionine, which is encoded by the ATG start codon. If desired, 
the N-terminus may be cleaved from the protein by in vitro incubation with cyanogen bromide. 

Alternatively, foreign proteins can also be secreted from the cell into the growth media by creating 
chimeric DNA molecules that encode a fusion protein comprised of a leader sequence fragment that 
provides for secretion of the foreign protein in mammalian cells. Preferably, there are processing 
10 sites encoded between the leader fragment and the foreign gene that can be cleaved either in vivo 
or in vitro. The leader sequence fragment usually encodes a signal peptide comprised of 
hydrophobic amino acids which direct the secretion of the protein from the cell. The adenovirus 
triparite leader is an example of a leader sequence that provides for secretion of a foreign protein 
in mammalian cells. 

15 Usually, transcription termination and polyadenylation sequences recognized by mammalian cells 
are regulatory regions located 3' to the translation stop codon and thus, together with the promoter 
elements, flank the coding sequence. The 3 f terminus of the mature mRNA is formed by site- 
specific post-transcriptional cleavage and polyadenylation [Birnstiel et al. (1985) Cell 47:349; 
Proudfoot and Whitelaw (1988) "Termination and 3 1 end processing of eukaryotic RNA. In 

20 Transcription and splicing (ed. B.D. Hames and D.M. Glover); Proudfoot (1989) Trends Biochem. 
ScL 74:105]. These sequences direct the transcription of an mRNA which can be translated into the 
polypeptide encoded by the DNA. Examples of transcription terminater/polyadenylation signals 
include those derived from SV40 [Sambrook et al (1989) "Expression of cloned genes in cultured 
mammalian cells." In Molecular Cloning: A Laboratory Manual]. 

25 Usually, the above described components, comprising a promoter, polyadenylation signal, and 
transcription termination sequence are put together into expression constructs. Enhancers, introns 
with functional splice donor and acceptor sites, and leader sequences may also be included in an 
expression construct, if desired. Expression constructs are often maintained in a replicon, such as 
an extrachromosomal element (eg. plasmids) capable of stable maintenance in a host, such as 

30 mammalian cells or bacteria. Mammalian replication systems include those derived from animal 
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viruses, which require trans-acting factors to replicate. For example, plasmids containing the 
replication systems of papovaviruses, such as SV40 [Gluzman (1981) Cell 23:175] or 
polyomavirus, replicate to extremely high copy number in the presence of the appropriate viral T 
antigen. Additional examples of mammalian replicons include those derived from bovine 
5 papillomavirus and Epstein-Barr virus. Additionally, the replicon may have two replicaton systems, 
thus allowing it to be maintained, for example, in mammalian cells for expression and in a 
prokaryotic host for cloning and amplification. Examples of such mammalian-bacteria shuttle 
vectors include pMT2 [Kaufinan et al. (1989) Mol Cell Biol 9:946] and pHEBO [Shimizu et al. 
(1986) Mol Cell Biol 5:1074]. 

10 The transformation procedure used depends upon the host to be transformed. Methods for 
introduction of heterologous polynucleotides into mammalian cells are known in the art and include 
dextran-mediated transfection, calcium phosphate precipitation, polybrene mediated transfection, 
protoplast fusion, electroporation, encapsulation of the polynucleotide(s) in liposomes, and direct 
microinjection of the DNA into nuclei. 

15 Mammalian cell lines available as hosts for expression are known in the art and include many 
immortalized cell lines available from the American Type Culture Collection (ATCC), including 
but not limited to, Chinese hamster ovary (CHO) cells, HeLa cells, baby hamster kidney (BHK) 
cells, monkey kidney cells (COS), human hepatocellular carcinoma cells {eg. Hep G2), and a 
number of other cell lines. 

20 ii. Baculovirus Systems 

The polynucleotide encoding the protein can also be inserted into a suitable insect expression vector, 
and is operably linked to the control elements within that vector. Vector construction employs 
techniques which are known in the art. Generally, the components of the expression system include 
a transfer vector, usually a bacterial plasmid, which contains both a fragment of the baculovirus 

25 genome, and a convenient restriction site for insertion of the heterologous gene or genes to be 
expressed; a wild type baculovirus with a sequence homologous to the baculovirus-specific fragment 
in the transfer vector (this allows for the homologous recombination of the heterologous gene in to 
the baculovirus genome); and appropriate insect host cells and growth media. 

After inserting the DNA sequence encoding the protein into the transfer vector, the vector and the 
30 wild type viral genome are transfected into an insect host cell where the vector and viral genome 
are allowed to recombine. The packaged recombinant virus is expressed and recombinant plaques 
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are identified and purified. Materials and methods for baculovirus/insect cell expression systems 
are commercially available in kit form from, inter alia, Invitrogen, San Diego CA ("MaxBac" kit). 
These techniques are generally known to those skilled in the art and fully described in Summers 
and Smith, Texas Agricultural Experiment Station Bulletin No. 1555 (1987) (hereinafter "Summers 
5 and Smith"). 

Prior to inserting the DNA sequence encoding the protein into the baculovirus genome, the above 
described components, comprising a promoter, leader (if desired), coding sequence of interest, and 
transcription termination sequence, are usually assembled into an intermediate transplacement 
construct (transfer vector). This construct may contain a single gene and operably linked regulatory 
1 0 elements; multiple genes, each with its owned set of operably linked regulatory elements; or multiple 
genes, regulated by the same set of regulatory elements. Intermediate transplacement constructs are 
often maintained in a replicon, such as an extrachromosomal element {eg. plasmids) capable of stable 
maintenance in a host, such as a bacterium. The replicon will have a replication system, thus allowing 
it to be maintained in a suitable host for cloning and amplification. 

1 5 Currently, the most commonly used transfer vector for introducing foreign genes into AcNPV is 
pAc373. Many other vectors, known to those of skill in the art, have also been designed. These 
include, for example, pVL985 (which alters the polyhedrin start codon from ATG to ATT, and 
which introduces a BamHI cloning site 32 basepairs downstream from the ATT; see Luckow and 
Summers, Virology (1989) 77:31. 

20 The plasmid usually also contains the polyhedrin polyadenylation signal (Miller et al. (1988) Ann. 
Rev. Microbiol, 42:111) and a prokaryotic ampicillin-resistance {amp) gene and origin of 
replication for selection and propagation in E. coli. 

Baculovirus transfer vectors usually contain a baculovirus promoter. A baculovirus promoter is any 
DNA sequence capable of binding a baculovirus RNA polymerase and initiating the downstream 

25 (5* to 3') transcription of a coding sequence {eg. structural gene) into mRNA. A promoter will have 
a transcription initiation region which is usually placed proximal to the 5' end of the coding 
sequence. This transcription initiation region usually includes an RNA polymerase binding site and 
a transcription initiation site. A baculovirus transfer vector may also have a second domain called 
an enhancer, which, if present, is usually distal to the structural gene. Expression may be either 

30 regulated or constitutive. 
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Structural genes, abundantly transcribed at late times in a viral infection cycle, provide particularly 
useful promoter sequences. Examples include sequences derived from the gene encoding the viral 
polyhedron protein, Friesen et al., (1986) "The Regulation of Baculovirus Gene Expression, 1 ' in: 
The Molecular Biology of Baculoviruses (ed. Walter Doerfler); EPO Publ. Nos. 127 839 and 155 
5 476; and the gene encoding the plO protein, Vlak et al, (1988), J. Gen. Virol 69:765. 

DNA encoding suitable signal sequences can be derived from genes for secreted insect or 
baculovirus proteins, such as the baculovirus polyhedrin gene (Carbonell et al. (1988) Gene, 
73:409). Alternatively, since the signals for mammalian cell posttranslational modifications (such 
as signal peptide cleavage, proteolytic cleavage, and phosphorylation) appear to be recognized by 

10 insect cells, and the signals required for secretion and nuclear accumulation also appear to be 
conserved between the invertebrate cells and vertebrate cells, leaders of non-insect origin, such as 
those derived from genes encoding human a-interferon, Maeda et al, (1985), Nature 315:592; 
human gastrin-releasing peptide, Lebacq-Verheyden et al., (1988), Molec. Cell Biol 5:3129; 
human IL-2, Smith et al., (1985) Proc. Nat'lAcad. Set USA, 52:8404; mouse IL-3, (Miyajima et 

15 al., (1987) Gene 55:273; and human glucocerebrosidase, Martin et al. (1988) DNA, 7:99, can also 
be used to provide for secretion in insects. 

A recombinant polypeptide or polyprotein may be expressed intracellularly or, if it is expressed 
with the proper regulatory sequences, it can be secreted. Good intracellular expression of nonfused 
foreign proteins usually requires heterologous genes that ideally have a short leader sequence 
20 containing suitable translation initiation signals preceding an ATG start signal. If desired, 
methionine at the N-terminus may be cleaved from the mature protein by in vitro incubation with 
cyanogen bromide. 

Alternatively, recombinant polyproteins or proteins which are not naturally secreted can be secreted 
from the insect cell by creating chimeric DNA molecules that encode a fusion protein comprised 
25 of a leader sequence fragment that provides for secretion of the foreign protein in insects. The 
leader sequence fragment usually encodes a signal peptide comprised of hydrophobic amino acids 
which direct the translocation of the protein into the endoplasmic reticulum. 

After insertion of the DNA sequence and/or the gene encoding the expression product precursor 
of the protein, an insect cell host is co-transformed with the heterologous DNA of the transfer 
30 vector and the genomic DNA of wild type baculovirus - usually by co-transfection. The promoter 
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and transcription termination sequence of the construct will usually comprise a 2-5kb section of the 
baculovirus genome. Methods for introducing heterologous DNA into the desired site in the 
baculovirus virus are known in the art. (See Summers and Smith supra; Ju et al. (1987); Smith et 
al., Mol Cell Biol (1983) 3:2156; and Luckow and Summers (1989)). For example, the insertion 
5 can be into a gene such as the polyhedrin gene, by homologous double crossover recombination; 
insertion can also be into a restriction enzyme site engineered into the desired baculovirus gene. 
Miller et al, (1989), Bioessays 4:91. The DNA sequence, when cloned in place of the polyhedrin 
gene in the expression vector, is flanked both 5' and 3 1 by polyhedrin-specific sequences and is 
positioned downstream of the polyhedrin promoter. 

10 The newly formed baculovirus expression vector is subsequently packaged into an infectious 
recombinant baculovirus. Homologous recombination occurs at low frequency (between about 1% 
and about 5%); thus, the majority of the virus produced after cotransfection is still wild-type virus. 
Therefore, a method is necessary to identify recombinant viruses. An advantage of the expression 
system is a visual screen allowing recombinant viruses to be distinguished. The polyhedrin protein, 

1 5 which is produced by the native virus, is produced at very high levels in the nuclei of infected cells 
at late times after viral infection. Accumulated polyhedrin protein forms occlusion bodies that also 
contain embedded particles. These occlusion bodies, up to 15 jam in size, are highly refractile, 
giving them a bright shiny appearance that is readily visualized under the light microscope. Cells 
infected with recombinant viruses lack occlusion bodies. To distinguish recombinant virus from 

20 wild-type virus, the transfection supernatant is plaqued onto a monolayer of insect cells by 
techniques known to those skilled in the art. Namely, the plaques are screened under the light 
microscope for the presence (indicative of wild-type virus) or absence (indicative of recombinant 
virus) of occlusion bodies. "Current Protocols in Microbiology" Vol. 2 (Ausubel et al. eds) at 16.8 
(Supp. 10, 1990); Summers and Smith, supra; Miller et al. (1989). 

25 Recombinant baculovirus expression vectors have been developed for infection into several insect 
cells. For example, recombinant baculo viruses have been developed for, inter alia: Aedes aegypti 
, Autographa californica, Bombyx mori, Drosophila melanogaster, Spodoptera frugiperda, and 
Trichoplusia ni (WO 89/046699; Carbonell et al., (1985) J. Virol 5(5:153; Wright (1986) Nature 
327:718; Smith et al., (1983) Mol Cell Biol 3:2156; and see generally, Fraser, et al (1989) In 

30 Vitro Cell Dev. Biol 25:225). 
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Cells and cell culture media are commercially available for both direct and fusion expression of 
heterologous polypeptides in a baculovirus/expression system; cell culture technology is generally 
known to those skilled in the art. See, eg. Summers and Smith supra. 

The modified insect cells may then be grown in an appropriate nutrient medium, which allows for 
5 stable maintenance of the plasmid(s) present in the modified insect host. Where the expression product 
gene is under inducible control, the host may be grown to high density, and expression induced. 
Alternatively, where expression is constitutive, the product will be continuously expressed into the 
medium and the nutrient medium must be continuously circulated, while removing the product of 
interest and augmenting depleted nutrients. The product may be purified by such techniques as 
10 chromatography, eg. HPLC, affinity chromatography, ion exchange chromatography, etc.; 
electrophoresis; density gradient centrifugation; solvent extraction, or the like. As appropriate, the 
product may be further purified, as required, so as to remove substantially any insect proteins which 
are also secreted in the medium or result from lysis of insect cells, so as to provide a product which 
is at least substantially free of host debris, eg. proteins, lipids and polysaccharides. 

15 In order to obtain protein expression, recombinant host cells derived from the transformants are 
incubated under conditions which allow expression of the recombinant protein encoding sequence. 
These conditions will vary, dependent upon the host cell selected. However, the conditions are 
readily ascertainable to those of ordinary skill in the art, based upon what is known in the art. 

iii. Plant Systems 

20 There are many plant cell culture and whole plant genetic expression systems known in the art. 
Exemplary plant cellular genetic expression systems include those described in patents, such as: 
. US 5,693,506; US 5,659,122; and US 5,608,143. Additional examples of genetic expression in 
plant cell culture has been described by Zenk, Phytochemistry 30:3861-3863 (1991). Descriptions 
of plant protein signal peptides may be found in addition to the references described above in 

25 Vaulcombe et al., Mol. Gen. Genet. 209:33-40 (1987); Chandler et al., Plant Molecular Biology 
3:407-418 (1984); Rogers, J. Biol Chem. 260:3731-3738 (1985); Rothstein et al., Gene 55:353-356 
(1987); Whittier et al, Nucleic Acids Research 15:2515-2535 (1987); Wirsel et al, Molecular 
Microbiology 3:3-14 (1989); Yu et al., Gene 122:247-253 (1992). A description of the regulation 
of plant gene expression by the phytohormone, gibberellic acid and secreted enzymes induced by 

30 gibberellic acid can be found in R.L. Jones and J. MacMillin, Gibberellins: in: Advanced Plant 
Physiology,. Malcolm B. Wilkins, ed., 1984 Pitman Publishing Limited, London, pp. 21-52. 
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References that describe other metabolically-regulated genes: Sheen, Plant Cell, 2:1027- 
1038(1990); Maas et al., EMBOJ. 9:3447-3452 (1990); Benkel and Hickey, Proc. Natl Acad. Set 
84:1337-1339(1987) 

Typically, using techniques known in the art, a desired polynucleotide sequence is inserted into an 
5 expression cassette comprising genetic regulatory elements designed for operation in plants. The 
expression cassette is inserted into a desired expression vector with companion sequences upstream 
and downstream from the expression cassette suitable for expression in a plant host. The 
companion sequences will be of plasmid or viral origin and provide necessary characteristics to the 
vector to permit the vectors to move DNA from an original cloning host, such as bacteria, to the 

1 0 desired plant host. The basic bacterial/plant vector construct will preferably provide a broad host 
range prokaryote replication origin; a prokaryote selectable marker; and, for Agrobacterium 
transformations, T DNA sequences for Agrobacterium-mediated transfer to plant chromosomes. 
Where the heterologous gene is not readily amenable to detection, the construct will preferably also 
have a selectable marker gene suitable for determining if a plant cell has been transformed. A 

15 general review of suitable markers, for example for the members of the grass family, is found in 
Wilmink and Dons, 1993, Plant Mol Biol Reptr, 11(2):165-185. 

Sequences suitable for permitting integration of the heterologous sequence into the plant genome 
are also recommended. These might include transposon sequences and the like for homologous 
recombination as well as Ti sequences which permit random insertion of a heterologous expression 
20 cassette into a plant genome. Suitable prokaryote selectable markers include resistance toward 
antibiotics such as ampicillin or tetracycline. Other DNA sequences encoding additional functions 
may also be present in the vector, as is known in the art. 

The nucleic acid molecules of the subject invention may be included into an expression cassette 
for expression of the protein(s) of interest. Usually, there will be only one expression cassette, 
25 although two or more are feasible. The recombinant expression cassette will contain in addition 
to the heterologous protein encoding sequence the following elements, a promoter region, plant 5' 
untranslated sequences, initiation codon depending upon whether or not the structural gene comes 
equipped with one, and a transcription and translation termination sequence. Unique restriction 
enzyme sites at the 5' and 3 1 ends of the cassette allow for easy insertion into a pre-existing vector. 
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A heterologous coding sequence may be for any protein relating to the present invention. The 
sequence encoding the protein of interest will encode a signal peptide which allows processing and 
translocation of the protein, as appropriate, and will usually lack any sequence which might result 
in the binding of the desired protein of the invention to a membrane. Since, for the most part, the 
5 transcriptional initiation region will be for a gene which is expressed and translocated during 
germination, by employing the signal peptide which provides for translocation, one may also 
provide for translocation of the protein of interest. In this way, the protein(s) of interest will be 
translocated from the cells in which they are expressed and may be efficiently harvested. Typically 
secretion in seeds are across the aleurone or scutellar epithelium layer into the endosperm of the 
10 seed. While it is not required that the protein be secreted from the cells in which the protein is 
produced, this facilitates the isolation and purification of the recombinant protein. 

Since the ultimate expression of the desired gene product will be in a eucaryotic cell it is desirable 
to determine whether any portion of the cloned gene contains sequences which will be processed 
out as introns by the host's splicosome machinery. If so, site-directed mutagenesis of the "intron" 
15 region may be conducted to prevent losing a portion of the genetic message as a false intron code, 
Reed and Maniatis, Cell 41:95-105, 1985. 

The vector can be microinjected directly into plant cells by use of micropipettes to mechanically 
transfer the recombinant DNA. Crossway, Mol Gen. Genet, 202:179-185, 1985. The genetic 
material may also be transferred into the plant cell by using polyethylene glycol, Krens, et al., 

20 Nature, 296, 72-74, 1982. Another method of introduction of nucleic acid segments is high 
velocity ballistic penetration by small particles with the nucleic acid either within the matrix of 
small beads or particles, or on the surface, Klein, et al., Nature, 327, 70-73, 1987 and Knudsen and 
Muller, 1991, Planta, 185:330-336 teaching particle bombardment of barley endosperm to create 
transgenic barley. Yet another method of introduction would be fusion of protoplasts with other 

25 entities, either minicells, cells, lysosomes or other fusible lipid-surfaced bodies, Fraley, et al., Proa 
Natl. Acad. Sci. USA, 79, 1859-1863, 1982. 

The vector may also be introduced into the plant cells by electroporation. (Fromm et al., Proc. Natl 
Acad. Sci. USA 82:5824, 1985). In this technique, plant protoplasts are electroporated in the 
presence of plasmids containing the gene construct. Electrical impulses of high field strength 
30 reversibly permeabilize biomembranes allowing the introduction of the plasmids. Electroporated 
plant protoplasts reform the cell wall, divide, and form plant callus. 
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All plants from which protoplasts can be isolated and cultured to give whole regenerated plants can 
be transformed by the present invention so that whole plants are recovered which contain the 
transferred gene. It is known that practically all plants can be regenerated from cultured cells or 
tissues, including but not limited to all major species of sugarcane, sugar beet, cotton, fruit and 
5 other trees, legumes and vegetables. Some suitable plants include, for example, species from the 
genera Fragaria, Lotus, Medicago, Onobrychis, Trifolium, Trigonella, Vigna, Citrus, Linum, 
Geranium, Manihot, Daucus, Arabidopsis, Brassica, Raphanus, Sinapis, Atropa, Capsicum, 
Datura, Hyoscyamus, Lycopersion, Nicotiana, Solarium, Petunia, Digitalis, Majorana, Cichorium, 
Helianthus, Lactuca, Bromus, Asparagus, Antirrhinum, Hererocallis, Nemesia, Pelargonium, 
10 Panicum, Pennisetum, Ranunculus, Senecio, Salpiglossis, Cucumis, Browaalia, Glycine, Lolium, 
Zea, Triticum, Sorghum, and Datura. 

Means for regeneration vary from species to species of plants, but generally a suspension of 
transformed protoplasts containing copies of the heterologous gene is first provided. Callus tissue 
is formed and shoots may be induced from callus and subsequently rooted. Alternatively, embryo 

15 formation can be induced from the protoplast suspension. These embryos germinate as natural 
embryos to form plants. The culture media will generally contain various amino acids and 
hormones, such as auxin and cytokinins. It is also advantageous to add glutamic acid and proline 
to the medium, especially for such species as corn and alfalfa. Shoots and roots normally develop 
simultaneously. Efficient regeneration will depend on the medium, on the genotype, and on the 

20 history of the culture. If these three variables are controlled, then regeneration is fully reproducible 
and repeatable. 

In some plant cell culture systems, the desired protein of the invention may be excreted or 
alternatively, the protein may be extracted from the whole plant. Where the desired protein of the 
invention is secreted into the medium, it may be collected. Alternatively, the embryos and 
25 embryoless-half seeds or other plant tissue may be mechanically disrupted to release any secreted 
protein between cells and tissues. The mixture may be suspended in a buffer solution to retrieve 
soluble proteins. Conventional protein isolation and purification methods will be then used to 
purify the recombinant protein. Parameters of time, temperature pH, oxygen, and volumes will be 
adjusted through routine methods to optimize expression and recovery of heterologous protein. 
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iv. Bacterial Systems 

Bacterial expression techniques are known in the art. A bacterial promoter is any DNA sequence 
capable of binding bacterial RNA polymerase and initiating the downstream (3') transcription of 
a coding sequence (eg. structural gene) into mRNA. A promoter will have a transcription initiation 
5 region which is usually placed proximal to the 5* end of the coding sequence. This transcription 
initiation region usually includes an RNA polymerase binding site and a transcription initiation site. 
A bacterial promoter may also have a second domain called an operator, that may overlap an 
adjacent RNA polymerase binding site at which RNA synthesis begins. The operator permits 
negative regulated (inducible) transcription, as a gene repressor protein may bind the operator and 

10 thereby inhibit transcription of a specific gene. Constitutive expression may occur in the absence 
of negative regulatory elements, such as the operator. In addition, positive regulation may be 
achieved by a gene activator protein binding sequence, which, if present is usually proximal (5') 
to the RNA polymerase binding sequence. An example of a gene activator protein is the catabolite 
activator protein (CAP), which helps initiate transcription of the lac operon in Escherichia coli (E. 

15 coli) [Raibaud et al (1984) Annu. Rev. Genet 75:173]. Regulated expression may therefore be 
either positive or negative, thereby either enhancing or reducing transcription. 

Sequences encoding metabolic pathway enzymes provide particularly useful promoter sequences. 
Examples include promoter sequences derived from sugar metabolizing enzymes, such as galactose, 
lactose (lac) [Chang et al (1977) Nature 795:1056], and maltose. Additional examples include 

20 promoter sequences derived from biosynthetic enzymes such as tryptophan (trp) [Goeddel et al 
(1980) Nuc. Acids Res. 5:4057; Yelverton et al (1981) Nucl Acids Res. 9:731; US 
patent 4,738,921; EP-A-0036776 and EP-A-0121775]. The g-laotamase (bla) promoter system 
[Weissmann (1981) 'The cloning of interferon and other mistakes." In Interferon 3 (ed. I. Gresser)], 
bacteriophage lambda PL [Shimatake et al (1981) Nature 292:128] and T5 [US patent 4,689,406] 

25 promoter systems also provide useful promoter sequences. 

In addition, synthetic promoters which do not occur in nature also function as bacterial promoters. 
For example, transcription activation sequences of one bacterial or bacteriophage promoter may 
be joined with the operon sequences of another bacterial or bacteriophage promoter, creating a 
synthetic hybrid promoter [US patent 4,551,433]. For example, the tac promoter is a hybrid trp-lac 
30 promoter comprised of both trp promoter and lac operon sequences that is regulated by the lac 
repressor [Amann et al (1983) Gene 25:167; de Boer et al (1983) Proc. Natl Acad. Set 80\2\]. 
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Furthermore, a bacterial promoter can include naturally occurring promoters of non-bacterial origin 
that have the ability to bind bacterial RNA polymerase and initiate transcription. A naturally 
occurring promoter of non-bacterial origin can also be coupled with a compatible RNA polymerase 
to produce high levels of expression of some genes in prokaryotes. The bacteriophage T7 RNA 
5 polymerase/promoter system is an example of a coupled promoter system [Studier et al (1986) I 
Mol Biol 189:\\3\ Tabor et al (1985) Proc Natl Acad. Set 52:1074], In addition, a hybrid 
promoter can also be comprised of a bacteriophage promoter and an E. coli operator region (EPO- 
A-0 267 851). 

In addition to a functioning promoter sequence, an efficient ribosome binding site is also useful for 
10 the expression of foreign genes in prokaryotes. In E. coli, the ribosome binding site is called the 
Shine-Dalgarno (SD) sequence and includes an initiation codon (ATG) and a sequence 3-9 
nucleotides in length located 3-1 1 nucleotides upstream of the initiation codon [Shine et al (1975) 
Nature 254:34], The SD sequence is thought to promote binding of mRNA to the ribosome by the 
pairing of bases between the SD sequence and the 3' and of E. coli 16S rRNA [Steitz et al (1979) 
15 "Genetic signals and nucleotide sequences in messenger RNA." In Biological Regulation and 
Development: Gene Expression (ed. R.F. Goldberger)]. To express eukaryotic genes and 
prokaryotic genes with weak ribosome-binding site [Sambrook et al (1989) "Expression of cloned 
genes in Escherichia coli." In Molecular Cloning: A Laboratory Manual]. 

A DNA molecule may be expressed intracellularly. A promoter sequence may be directly linked 
20 with the DNA molecule, in which case the first amino acid at the N-terminus will always be a 
methionine, which is encoded by the ATG start codon. If desired, methionine at the N-terminus 
may be cleaved from the protein by in vitro incubation with cyanogen bromide or by either in vivo 
on in vitro incubation with abacterial methionine N-terminal peptidase (EPO-A-0 219 237). 

Fusion proteins provide an alternative to direct expression. Usually, a DNA sequence encoding the 
25 N-terminal portion of an endogenous bacterial protein, or other stable protein, is fused to the 5' end 
of heterologous coding sequences. Upon expression, this construct will provide a fusion of the two 
amino acid sequences. For example, the bacteriophage lambda cell gene can be linked at the 5' 
terminus of a foreign gene and expressed in bacteria. The resulting fusion protein preferably retains 
a site for a processing enzyme (factor Xa) to cleave the bacteriophage protein from the foreign gene 
30 [Nagai et al (1984) Nature 309:810]. Fusion proteins can also be made with sequences from the 
lacZ [Jia et al (1987) Gene 60:197], trpE [Allen et al. (1987) J. Biotechnol. 5:93; Makoff et al 
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(1989) J. Gen. Microbiol 135:1 1], and Chey [EP-A-0 324 647] genes. The DNA sequence at the 
junction of the two amino acid sequences may or may not encode a cleavable site. Another example 
is a ubiquitin fusion protein. Such a fusion protein is made with the ubiquitin region that preferably 
retains a site for a processing enzyme (eg. ubiquitin specific processing-protease) to cleave the 
5 ubiquitin from the foreign protein. Through this method, native foreign protein can be isolated 
[Miller et al. (1 989) Bio/Technology 7:698]. 

Alternatively, foreign proteins can also be secreted from the cell by creating chimeric DNA molecules 
that encode a fusion protein comprised of a signal peptide sequence fragment that provides for secretion 
of the foreign protein in bacteria [US patent 4,336,336]. The signal sequence fragment usually encodes 
10 a signal peptide comprised of hydrophobic amino acids which direct the secretion of the protein from the 
cell. The protein is either secreted into the growth media (gram-positive bacteria) or into the periplasmic 
space, located between the inner and outer membrane of the cell (gram-negative bacteria). Preferably 
there are processing sites, which can be cleaved either in vivo or in vitro encoded between the signal 
peptide fragment and the foreign gene. 

15 DNA encoding suitable signal sequences can be derived from genes for secreted bacterial proteins, 
such as the E. coli outer membrane protein gene (ompA) [Masui et al. (1983), in: Experimental 
Manipulation of Gene Expression; Ghrayeb et al (1984) EMBO J. 3:2437] and the E. coli alkaline 
phosphatase signal sequence iphoA) [Oka et al (1985) Proc. Natl Acad. Set 52:7212]. As an 
additional example, the signal sequence of the alpha-amylase gene from various Bacillus strains 

20 can be used to secrete heterologous proteins from B. subtilis [Palva et al (1982) Proc. Natl. Acad. 
Sci. USA 79:5582; EP-A-0 244 042]. 

Usually, transcription termination sequences recognized by bacteria are regulatory regions located 
3' to the translation stop codon, and thus together with the promoter flank the coding sequence. 
These sequences direct the transcription of an mRNA which can be translated into the polypeptide 
25 encoded by the DNA. Transcription termination sequences frequently include DNA sequences of 
about 50 nucleotides capable of forming stem loop structures that aid in terminating transcription. 
Examples include transcription termination sequences derived from genes with strong promoters, 
such as the trp gene in E. coli as well as other biosynthetic genes. 

Usually, the above described components, comprising a promoter, signal sequence (if desired), 
30 coding sequence of interest, and transcription termination sequence, are put together into expression 
constructs. Expression constructs are often maintained in a replicon, such as an extrachromosomal 
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element (eg. plasmids) capable of stable maintenance in a host, such as bacteria. The replicon will 
have a replication system, thus allowing it to be maintained in a prokaryotic host either for 
expression or for cloning and amplification. In addition, a replicon may be either a high or low 
copy number plasmid. A high copy number plasmid will generally have a copy number ranging 
5 from about 5 to about 200, and usually about 10 to about 150. A host containing a high copy 
number plasmid will preferably contain at least about 10, and more preferably at least about 20 
plasmids. Either a high or low copy number vector may be selected, depending upon the effect of 
the vector and the foreign protein on the host. 

Alternatively, the expression constructs can be integrated into the bacterial genome with an 
10 integrating vector. Integrating vectors usually contain at least one sequence homologous to the 
bacterial chromosome that allows the vector to integrate. Integrations appear to result from 
recombinations between homologous DNA in the vector and the bacterial chromosome. For 
example, integrating vectors constructed with DNA from various Bacillus strains integrate into the 
Bacillus chromosome (EP-A- 0 127 328). Integrating vectors may also be comprised of 
1 5 bacteriophage or transposon sequences. 

Usually, extrachromosomal and integrating expression constructs may contain selectable markers 
to allow for the selection of bacterial strains that have been transformed. Selectable markers can 
be expressed in the bacterial host and may include genes which render bacteria resistant to drugs 
such as ampicillin, chloramphenicol, erythromycin, kanamycin (neomycin), and tetracycline 
20 [Davies et al (1978) Annu. Rev. Microbiol. 52:469]. Selectable markers may also include 
biosynthetic genes, such as those in the histidine, tryptophan, and leucine biosynthetic pathways. 

Alternatively, some of the above described components can be put together in transformation 
vectors. Transformation vectors are usually comprised of a selectable market that is either 
maintained in a replicon or developed into an integrating vector, as described above. 

25 Expression and transformation vectors, either extra-chromosomal replicons or integrating vectors, 
have been developed for transformation into many bacteria. For example, expression vectors have 
been developed for, inter alia, the following bacteria: Bacillus subtilis [Palva et al (1982) Proc. 
Natl Acad. Set USA 79:5582; EP-A-0 036 259 and EP-A-0 063 953; WO 84/04541], Escherichia 
coli [Shimatake et al (1981) Nature 292:128; Amann et al (1985) Gene 40:183; Studier et al 

30 (1986) J. Mol Biol 759:113; EP-A-0 036 776,EP-A-0 136 829 and EP-A-0 136 907], 
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Streptococcus cremoris [Powell et al {\9%%)Appl. Environ, Microbiol 54:655]; Streptococcus 
lividans [Powell et al (1988) Appl Environ. Microbiol 54:655], Streptomyces lividans [US patent 
4,745,056]. 

Methods of introducing exogenous DNA into bacterial hosts are well-known in the art, and usually 
5 include either the transformation of bacteria treated with CaCl 2 or other agents, such as divalent 
cations and DMSO. DNA can also be introduced into bacterial cells by electroporation. 
Transformation procedures usually vary with the bacterial species to be transformed. See eg. 
[Masson et al (1989) FEMS Microbiol Lett. 60:273; Palva et al (1982) Proc. Natl Acad Sci. USA 
79:5582; EP-A-0 036 259 and EP-A-0 063 953; WO 84/04541, Bacillus], [Miller et al (1988) 

10 Proa Natl Acad. Sci. £5:856; Wang et al (1990) J. Bacteriol 772:949, Campylobacter], [Cohen 
et al (1973) Proc. Natl Acad. Sci. 69:2110; Dower et al (1988) Nucleic Acids Res. 7(5:6127; 
Kushner (1978) "An improved method for transformation of Escherichia coli with ColEl-derived 
plasmids. In Genetic Engineering: Proceedings of the International Symposium on Genetic 
Engineering (eds. H.W. Boyer and S. Nicosia); Mandel et al (1970) Mol Biol 53:159; Taketo 

15 (1988) Biochim. Biophys. Acta 949:3\%; Escherichia], [Chassy et al (1987) FEMS Microbiol Lett. 
44:173 Lactobacillus]; [Fiedler etal (1988) Anal Biochem 770:38, Pseudomonas]; [Augustin et 
al (1990) FEMS Microbiol Lett. 66:203, Staphylococcus], [Barany et al (1980) J. Bacteriol. 
744:698; Harlander (1987) "Transformation of Streptococcus lactis by electroporation, in: 
Streptococcal Genetics (ed. J. Ferretti and R. Curtiss III); Perry et al (1981) Infect. Immun. 

20 52:1295; Powell et al (1988) Appl Environ. Microbiol 54:655; Somkuti et al (1987) Proc. 4th 
Evr. Cong. Biotechnology 7:412, Streptococcus]. 
v. Yeast Expression 

Yeast expression systems are also known to one of ordinary skill in the art. A yeast promoter is any 
DNA sequence capable of binding yeast RNA polymerase and initiating the downstream (3') 

25 transcription of a coding sequence (eg. structural gene) into mRNA. A promoter will have a 
transcription initiation region which is usually placed proximal to the 5 f end of the coding sequence. 
This transcription initiation region usually includes an RNA polymerase binding site (the "TATA 
Box") and a transcription initiation site. A yeast promoter may also have a second domain called 
an upstream activator sequence (UAS), which, if present, is usually distal to the structural gene. 

30 The UAS permits regulated (inducible) expression. Constitutive expression occurs in the absence 
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of a UAS. Regulated expression may be either positive or negative, thereby either enhancing or 
reducing transcription. 

Yeast is a fermenting organism with an active metabolic pathway, therefore sequences encoding 
enzymes in the metabolic pathway provide particularly useful promoter sequences. Examples 
5 include alcohol dehydrogenase (ADH) (EP-A-0 284 044), enolase, glucokinase, glucose-6- 
phosphate isomerase, glyceraldehyde-3-phosphate-dehydrogenase (GAP or GAPDH), hexokinase, 
phosphofiuctokinase, 3-phosphoglycerate mutase, and pyruvate kinase (PyK) (EPO-A-0 329 203). 
The yeast PH05 gene, encoding acid phosphatase, also provides useful promoter sequences 
[Myanohara et al (1983) Proc. Natl. Acad. Sci. USA 80:1], 

1 0 In addition, synthetic promoters which do not occur in nature also function as yeast promoters. For 
example, UAS sequences of one yeast promoter may be joined with the transcription activation 
region of another yeast promoter, creating a synthetic hybrid promoter. Examples of such hybrid 
promoters include the ADH regulatory sequence linked to the GAP transcription activation region 
(US Patent Nos. 4,876,197 and 4,880,734). Other examples of hybrid promoters include promoters 

1 5 which consist of the regulatory sequences of either the ADH2, GAL4, GAL 10, OR PH05 genes, 
combined with the transcriptional activation region of a glycolytic enzyme gene such as GAP or 
PyK (EP-A-0 164 556). Furthermore, a yeast promoter can include naturally occurring promoters 
of non-yeast origin that have the ability to bind yeast RNA polymerase and initiate transcription. 
Examples of such promoters include, inter alia, [Cohen et al (1980) Proc. Natl Acad. Sci. USA 

20 77:1078; Henikoff et al (1981) Nature 253:835; Hollenberg et al (1981) Curr. Topics Microbiol 
Immunol. Ptf:119; Hollenberg et al (1979) "The Expression of Bacterial Antibiotic Resistance 
Genes in the Yeast Saccharomyces cerevisiae," in: Plasmids of Medical, Environmental and 
Commercial Importance (eds. K.N. Timmis and A. Puhler); Mercerau-Puigalon et al. (1980) Gene 
77:163; Panthier et al (1980) Curr. Genet. 2:109;]. 

25 A DNA molecule may be expressed intracellularly in yeast. A promoter sequence may be directly 
linked with the DNA molecule, in which case the first amino acid at the N-terminus of the 
recombinant protein will always be a methionine, which is encoded by the ATG start codon. If 
desired, methionine at the N-terminus may be cleaved from the protein by in vitro incubation with 
cyanogen bromide. 
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Fusion proteins provide an alternative for yeast expression systems, as well as in mammalian, 
baculovirus, and bacterial expression systems. Usually, a DNA sequence encoding the N-terminal 
portion of an endogenous yeast protein, or other stable protein, is fused to the 5' end of 
heterologous coding sequences. Upon expression, this construct will provide a fusion of the two 
5 amino acid sequences. For example, the yeast or human superoxide dismutase (SOD) gene, can be 
linked at the 5 1 terminus of a foreign gene and expressed in yeast. The DNA sequence at the 
junction of the two amino acid sequences may or may not encode a cleavable site. See eg. EP-A-0 
196 056. Another example is a ubiquitin fusion protein. Such a fusion protein is made with the 
ubiquitin region that preferably retains a site for a processing enzyme (eg. ubiquitin-specific 
10 processing protease) to cleave the ubiquitin from the foreign protein. Through this method, 
therefore, native foreign protein can be isolated (eg. WO88/024066). 

Alternatively, foreign proteins can also be secreted from the cell into the growth media by creating 
chimeric DNA molecules that encode a fusion protein comprised of a leader sequence fragment that 
provide for secretion in yeast of the foreign protein. Preferably, there are processing sites encoded 
15 between the leader fragment and the foreign gene that can be cleaved either in vivo or in vitro. The 
leader sequence fragment usually encodes a signal peptide comprised of hydrophobic amino acids 
which direct the secretion of the protein from the cell. 

DNA encoding suitable signal sequences can be derived from genes for secreted yeast proteins, 
such as the yeast invertase gene (EP-A-0 012 873; JPO. 62,096,086) and the A-factor gene (US 
20 patent 4,588,684). Alternatively, leaders of non-yeast origin, such as an interferon leader, exist that 
also provide for secretion in yeast (EP-A-0 060 057). 

A preferred class of secretion leaders are those that employ a fragment of the yeast alpha-factor 
gene, which contains both a "pre" signal sequence, and a "pro" region. The types of alpha-factor 
fragments that can be employed include the full-length pre-pro alpha factor leader (about 83 amino 
25 acid residues) as well as truncated alpha-factor leaders (usually about 25 to about 50 amino acid 
residues) (US Patents 4,546,083 and 4,870,008; EP-A-0 324 274). Additional leaders employing 
an alpha-factor leader fragment that provides for secretion include hybrid alpha-factor leaders made 
with a presequence of a first yeast, but a pro-region from a second yeast alphafactor. (eg. see WO 
89/02463.) 
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Usually, transcription termination sequences recognized by yeast are regulatory regions located 3 ! 
to the translation stop codon, and thus together with the promoter flank the coding sequence. These 
sequences direct the transcription of an mRNA which can be translated into the polypeptide 
encoded by the DNA, Examples of transcription terminator sequence and other yeast-recognized 
5 termination sequences, such as those coding for glycolytic enzymes. 

Usually, the above described components, comprising a promoter, leader (if desired), coding 
sequence of interest, and transcription termination sequence, are put together into expression 
constructs. Expression constructs are often maintained in a replicon, such as an extrachromosomal 
element (eg. plasmids) capable of stable maintenance in a host, such as yeast or bacteria. The 

1 0 replicon may have two replication systems, thus allowing it to be maintained, for example, in yeast 
for expression and in a prokaryotic host for cloning and amplification. Examples of such yeast- 
bacteria shuttle vectors include YEp24 [Botstein et al (1979) Gene 8: 17-24], pCl/1 [Brake et al 
(1984) Proc. Natl. Acad. Sci USA 57:4642-4646], and YRpl7 [Stinchcomb et al (1982) J. Mol 
Biol 158:157]. In addition, a replicon may be either a high or low copy number plasmid. A high 

1 5 copy number plasmid will generally have a copy number ranging from about 5 to about 200, and 
usually about 10 to about 150. A host containing a high copy number plasmid will preferably have 
at least about 10, and more preferably at least about 20. Enter a high or low copy number vector 
may be selected, depending upon the effect of the vector and the foreign protein on the host. See 
eg. Brake et al. 9 supra. 

20 Alternatively, the expression constructs can be integrated into the yeast genome with an integrating 
vector. Integrating vectors usually contain at least one sequence homologous to a yeast 
chromosome that allows the vector to integrate, and preferably contain two homologous sequences 
flanking the expression construct. Integrations appear to result from recombinations between 
homologous DNA in the vector and the yeast chromosome [Orr- Weaver et al (1983) Methods in 

25 Enzymol 707:228-245]. An integrating vector may be directed to a specific locus in yeast by 
selecting the appropriate homologous sequence for inclusion in the vector. See Orr- Weaver et al y 
supra. One or more expression construct may integrate, possibly affecting levels of recombinant 
protein produced [Rine et al (1983) Proc. Natl Acad. Sci. USA 50:6750]. The chromosomal 
sequences included in the vector can occur either as a single segment in the vector, which results 

30 in the integration of the entire vector, or two segments homologous to adjacent segments in the 
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chromosome and flanking the expression construct in the vector, which can result in the stable 
integration of only the expression construct. 

Usually, extrachromosomal and integrating expression constructs may contain selectable markers 
to allow for the selection of yeast strains that have been transformed. Selectable markers may 
5 include biosynthetic genes that can be expressed in the yeast host, such as ADE2, HIS4, LEU2 y 
TRPl y and ALG7, and the G418 resistance gene, which confer resistance in yeast cells to 
tunicamycin and G418, respectively. In addition, a suitable selectable marker may also provide 
yeast with the ability to grow in the presence of toxic compounds, such as metal. For example, the 
presence of CUP1 allows yeast to grow in the presence of copper ions [Butt et al (1987) Microbiol 
10 Rev. 57:351]. 

Alternatively, some of the above described components can be put together into transformation 
vectors. Transformation vectors are usually comprised of a selectable marker that is either 
maintained in a replicon or developed into an integrating vector, as described above. 

Expression and transformation vectors, either extrachromosomal replicons or integrating vectors, 
15 have been developed for transformation into many yeasts. For example, expression vectors have 
been developed for, inter alia, the following yeasts: Candida albicans [Kurtz, et al. (1986) Mol 
Cell Biol 5:142], Candida maltosa [Kunze, etal (1985)7. Basic Microbiol 25:141]. Hansenula 
polymorpha [Gleeson, et al. (1986) J. Gen. Microbiol 132:3459; Roggenkamp et al. (1986) Mol 
Gen. Genet. 202:302], Kluyveromyces fragilis [Das, et al. (1984) J. Bacteriol 755:1165], 
20 Kluyveromyces lactis [De Louvencourt et al. (1983) J. Bacteriol. 1 54:131 \ Van den Berg et al 
(1990) Bio/Technology 5:135], Pichia guillerimondii [Kunze et al. (1985) J. Basic Microbiol 
25:141], Pichia pastoris [Cregg, et al (1985) Mol Cell Biol 5:3376; US Patent Nos. 4,837,148 
and 4,929,555], Saccharomyces cerevisiae pffinnen et al. (1978) Proc. Natl Acad. Sci. USA 
75:1929; Ito et al (1983) J. Bacteriol 753:163], Schizosaccharomyces pombe [Beach and Nurse 
25 (198 1) Nature 300:106], and Yarrowia lipolytica pavidow, et al (1985) Curr. Genet. 70:380471 
Gaillardin, etal (1985) Curr. Genet. 70:49]. 

Methods of introducing exogenous DNA into yeast hosts are well-known in the art, and usually 
include either the transformation of spheroplasts or of intact yeast cells treated with alkali cations. 
Transformation procedures usually vary with the yeast species to be transformed. See eg. [Kurtz 
30 et al. (1986) Mol Cell Biol 5:142; Kunze et al (1985) /. Basic Microbiol 25:141; Candida]; 
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[Gleeson et al (1986) J. Gen. Microbiol 732:3459; Roggenkamp et al (1986) Mol Gen. Genet 
202:302', Hansenula]; [Das et al (1984) J. Bacteriol 158:1 165; De Louvencourt et al (1983) 1 
Bacteriol 154:1165; Van den Berg et al (1990) Bio/Technology 5:135; Kluyveromyces]; [Cregg 
et al (1985) Mol Cell Biol 5:3376; Kunze et al (1985) J. Basic Microbiol 25:\A\; US Patent 
5 Nos. 4,837,148 and 4,929,555; Pichia]; [Hinnen et al (1978) Proa Natl Acad ScL USA 75;1929; 
Ito et al (1983) J. Bacteriol 153:163 Saccharomyces]; [Beach and Nurse (1981) Nature 300:106; 
Schizosaccharomyces]; [Davidow et al (1985) Curr. Genet. 10:39; Gaillardin et al (1985) Curr. 
Genet 10:49; Yarrowia]. 

Antibodies 

10 As used herein, the term "antibody" refers to a polypeptide or group of polypeptides composed of 
at least one antibody combining site. An "antibody combining site" is the three-dimensional 
binding space with an internal surface shape and charge distribution complementary to the features 
of an epitope of an antigen, which allows a binding of the antibody with the antigen. "Antibody" 
includes, for example, vertebrate antibodies, hybrid antibodies, chimeric antibodies, humanised 

15 antibodies, altered antibodies, univalent antibodies, Fab proteins, and single domain antibodies. 

Antibodies against the proteins of the invention are useful for affinity chromatography, 
immunoassays, and distinguishing/identifying Neisserial proteins. 

Antibodies to the proteins of the invention, both polyclonal and monoclonal, may be prepared by 
conventional methods. In general, the protein is first used to immunize a suitable animal, preferably 

20 a mouse, rat, rabbit or goat. Rabbits and goats are preferred for the preparation of polyclonal sera 
due to the volume of serum obtainable, and the availability of labeled anti-rabbit and anti-goat 
antibodies. Immunization is generally performed by mixing or emulsifying the protein in saline, 
preferably in an adjuvant such as Freund's complete adjuvant, and injecting the mixture or 
emulsion parenterally (generally subcutaneously or intramuscularly). A dose of 50-200 jig/injection 

25 is typically sufficient. Immunization is generally boosted 2-6 weeks later with one or more 
injections of the protein in saline, preferably using Freund's incomplete adjuvant. One may 
alternatively generate antibodies by in vitro immunization using methods known in the art, which 
for the purposes of this invention is considered equivalent to in vivo immunization. Polyclonal 
antisera is obtained by bleeding the immunized animal into a glass or plastic container, incubating 

30 the blood at 25°C for one hour, followed by incubating at 4°C for 2-18 hours. The serum is 
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recovered by centrifugation (eg. l,000g for 10 minutes). About 20-50 ml per bleed may be obtained 
from rabbits. 

Monoclonal antibodies are prepared using the standard method of Kohler & Milstein [Nature 
(1975) 256:495-96], or a modification thereof. Typically, a mouse or rat is immunized as described 
5 above. However, rather than bleeding the animal to extract serum, the spleen (and optionally 
several large lymph nodes) is removed and dissociated into single cells. If desired, the spleen cells 
may be screened (after removal of nonspecifically adherent cells) by applying a cell suspension to 
a plate or well coated with the protein antigen. B-cells expressing membrane-bound 
immunoglobulin specific for the antigen bind to the plate, and are not rinsed away with the rest of 

10 the suspension. Resulting B-cells, or all dissociated spleen cells, are then induced to fuse with 
myeloma cells to form hybridomas, and are cultured in a selective medium (eg. hypoxanthine, 
aminopterin, thymidine medium, "HAT"). The resulting hybridomas are plated by limiting dilution, 
and are assayed for the production of antibodies which bind specifically to the immunizing antigen 
(and which do not bind to unrelated antigens). The selected MAb-secreting hybridomas are then 

15 cultured either in vitro (eg. in tissue culture bottles or hollow fiber reactors), or in vivo (as ascites 
in mice). 

If desired, the antibodies (whether polyclonal or monoclonal) may be labeled using conventional 
techniques. Suitable labels include fluorophores, chromophores, radioactive atoms (particularly P 
and 125 I), electron-dense reagents, enzymes, and ligands having specific binding partners. Enzymes 

20 are typically detected by their activity. For example, horseradish peroxidase is usually detected by its 
ability to convert 3,3',5,5 ! -tetramethylbenzidine (TMB) to a blue pigment, quantifiable with a 
spectrophotometer. "Specific binding partner" refers to a protein capable of binding a ligand molecule 
with high specificity, as for example in the case of an antigen and a monoclonal antibody specific 
therefor. Other specific binding partners include biotin and avidin or streptavidin, IgG and protein A, 

25 and the numerous receptor-ligand couples known in the art. It should be understood that the above 
description is not meant to categorize the various labels into distinct classes, as the same label may 
serve in several different modes. For example, 125 I may serve as a radioactive label or as an 
electron-dense reagent. HRP may serve as enzyme or as antigen for a MAb. Further, one may combine 
various labels for desired effect. For example, MAbs and avidin also require labels in the practice of 

30 this invention: thus, one might label a MAb with biotin, and detect its presence with avidin labeled 
with 125 I, or with an anti-biotin MAb labeled with HRP. Other permutations and possibilities will be 
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readily apparent to those of ordinary skill in the art, and are considered as equivalents within the scope 
of the instant invention. 

Pharmaceutical Compositions 

Pharmaceutical compositions can comprise either polypeptides, antibodies, or nucleic acid of the 
5 invention. The pharmaceutical compositions will comprise a therapeutically effective amount of 
either polypeptides, antibodies, or polynucleotides of the claimed invention. 

The term "therapeutically effective amount" as used herein refers to an amount of a therapeutic 
agent to treat, ameliorate, or prevent a desired disease or condition, or to exhibit a detectable 
therapeutic or preventative effect. The effect can be detected by, for example, chemical markers or 

1 0 antigen levels. Therapeutic effects also include reduction in physical symptoms, such as decreased 
body temperature. The precise effective amount for a subject will depend upon the subject's size 
and health, the nature and extent of the condition, and the therapeutics or combination of 
therapeutics selected for administration. Thus, it is not useful to specify an exact effective amount 
in advance. However, the effective amount for a given situation can be determined by routine 

1 5 experimentation and is within the judgement of the clinician. 

For purposes of the present invention, an effective dose will be from about 0.01 mg/ kg to 50 mg/kg 
or 0.05 mg/kg to about 10 mg/kg of the DNA constructs in the individual to which it is administered. 

A pharmaceutical composition can also contain a pharmaceutically acceptable carrier. The term 
"pharmaceutically acceptable carrier" refers to a carrier for administration of a therapeutic agent, such 

20 as antibodies or a polypeptide, genes, and other therapeutic agents. The term refers to any 
pharmaceutical carrier that does not itself induce the production of antibodies harmful to the 
individual receiving the composition, and which may be administered without undue toxicity. Suitable 
carriers may be large, slowly metabolized macromolecules such as proteins, polysaccharides, 
polylactic acids, polyglycolic acids, polymeric amino acids, amino acid copolymers, and inactive virus 

25 particles. Such carriers are well known to those of ordinary skill in the art. 

Pharmaceutically acceptable salts can be used therein, for example, mineral acid salts such as 
hydrochlorides, hydrobromides, phosphates, sulfates, and the like; and the salts of organic acids 
such as acetates, propionates, malonates, benzoates, and the like. A thorough discussion of 
pharmaceutically acceptable excipients is available in Remington's Pharmaceutical Sciences (Mack 
30 Pub. Co., N.J. 1991). 
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Phaniaaceutically acceptable carriers in therapeutic compositions may contain liquids such as water, 
saline, glycerol and ethanol. Additionally, auxiliary substances, such as wetting or emulsifying agents, 
pH buffering substances, and the like, may be present in such vehicles. Typically, the therapeutic 
compositions are prepared as injectables, either as liquid solutions or suspensions; solid forms suitable 
5 for solution in, or suspension in, liquid vehicles prior to injection may also be prepared. Liposomes 
are included within the definition of a pharmaceutical^ acceptable carrier. 

Delivery Methods 

Once formulated, the compositions of the invention can be administered directly to the subject. The 
subjects to be treated can be animals; in particular, human subjects can be treated. 

10 Direct delivery of the compositions will generally be accomplished by injection, either 
subcutaneously, intraperitoneally, intravenously or intramuscularly or delivered to the interstitial 
space of a tissue. The compositions can also be administered into a lesion. Other modes of 
administration include oral and pulmonary administration, suppositories, and transdermal or 
transcutaneous applications {eg, see WO98/20734), needles, and gene guns or hyposprays. Dosage 

1 5 treatment may be a single dose schedule or a multiple dose schedule. 

Vaccines 

Vaccines according to the invention may either be prophylactic (ie. to prevent infection) or 
therapeutic (ie. to treat disease after infection). 

Such vaccines comprise immunising antigen(s), immunogen(s), polypeptide(s), protein(s) or nucleic acid, 
20 usually in combination with "pharmaceutically acceptable carriers," which include any carrier that does 
not itself induce the production of antibodies harmful to the individual receiving the composition. 
Suitable carriers are typically large, slowly metabolized macromolecules such as proteins, 
polysaccharides, polylactic acids, polyglycolic acids, polymeric amino acids, amino acid copolymers, 
lipid aggregates (such as oil droplets or liposomes), and inactive virus particles. Such carriers are well 
25 known to those of ordinary skill in the art. Additionally, these carriers may function as 
immunostimulating agents ("adjuvants"). Furthermore, the antigen or immunogen may be conjugated to 
a bacterial toxoid, such as a toxoid from diphtheria, tetanus, cholera, H. pylori, etc. pathogens. 

Preferred adjuvants to enhance effectiveness of the composition include, but are not limited to: (1) 
aluminum salts (alum), such as aluminum hydroxide, aluminum phosphate, aluminum sulfate, etc; 
30 (2) oil-in-water emulsion formulations (with or without other specific immunostimulating agents 
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such as muramyl peptides (see below) or bacterial cell wall components), such as for example (a) 
MF59™ (WO 90/14837; Chapter 10 in Vaccine design: the subunit and adjuvant approach, eds. 
Powell & Newman, Plenum Press 1995), containing 5% Squalene, 0.5% Tween 80, and 0,5% Span 
85 (optionally containing various amounts of MTP-PE (see below), although not required) 
5 formulated into submicron particles using a microfluidizer such as Model HOY microfluidizer 
(Microfluidics, Newton, MA), (b) SAF, containing 10% Squalane, 0.4% Tween 80, 5% pluronic- 
blocked polymer L121, and thr-MDP (see below) either microfluidized into a submicron emulsion 
or vortexed to generate a larger particle size emulsion, and (c) Ribi™ adjuvant system (RAS), (Ribi 
hnmunochem, Hamilton, MT) containing 2% Squalene, 0.2% Tween 80, and one or more bacterial 

10 cell wall components from the group consisting of monophosphorylipid A (MPL), trehalose 
dimycolate (TDM), and cell wall skeleton (CWS), preferably MPL + CWS (Detox™); (3) saponin 
adjuvants, such as Stimulon™ (Cambridge Bioscience, Worcester, MA) may be used or particles 
generated therefrom such as ISCOMs (immunostimulating complexes); (4) Complete Freund's 
Adjuvant (CFA) and Incomplete Freund's Adjuvant (IF A); (5) cytokines, such as interleukins (eg. 

15 IL-1, IL-2, IL-4, IL-5, IL-6, IL-7, IL-12, eta), interferons (eg. gamma interferon), macrophage 
colony stimulating factor (M-CSF), tumor necrosis factor (TNF), etc; and (6) other substances that 
act as immunostimulating agents to enhance the effectiveness of the composition. Alum and 
MF59™ are preferred. 

As mentioned above, muramyl peptides include, but are not limited to, N-acetyl-muramyl-L- 
20 threonyl-D-isoglutamine (thr-MDP), N-acetyl-normuramyl-L-alanyl-D-isoglutamine (nor-MDP), 
N-acetylmuramyl-L-alanyl-D-isoglutaminyl-L-alanine-2-( 1 '-2 ! -dipalmitoyl-,sw-glycero-3 - 
hydroxyphosphoryloxy)-ethylamine (MTP-PE), etc. 

The immunogenic compositions (eg. the immunising antigen/immunogen/polypeptide/protein/ 
nucleic acid, pharmaceutical^ acceptable carrier, and adjuvant) typically will contain diluents, such 
25 as water, saline, glycerol, ethanol, etc. Additionally, auxiliary substances, such as wetting or 
emulsifying agents, pH buffering substances, and the like, may be present in such vehicles. 

Typically, the immunogenic compositions are prepared as injectables, either as liquid solutions or 
suspensions; solid forms suitable for solution in, or suspension in, liquid vehicles prior to injection 
may also be prepared. The preparation also may be emulsified or encapsulated in liposomes for 
30 enhanced adjuvant effect, as discussed above under pharmaceutical acceptable carriers. 
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Immunogenic compositions used as vaccines comprise an immunologically effective amount of the 
antigenic or immunogenic polypeptides, as well as any other of the above-mentioned components, 
as needed. By "immunologically effective amount", it is meant that the administration of that 
amount to an individual, either in a single dose or as part of a series, is effective for treatment or 

5 prevention. This amount varies depending upon the health and physical condition of the individual 
to be treated, the taxonomic group of individual to be treated (eg. nonhuman primate, primate, etc.), 
the capacity of the individual's immune system to synthesize antibodies, the degree of protection 
desired, the formulation of the vaccine, the treating doctor's assessment of the medical situation, 
and other relevant factors. It is expected that the amount will fall in a relatively broad range that 

10 can be determined through routine trials. 

The immunogenic compositions are conventionally administered parenterally, eg. by injection, 
either subcutaneously, intramuscularly, or transdermally/transcutaneously (eg. WO98/20734). 
Additional formulations suitable for other modes of administration include oral and pulmonary 
formulations, suppositories, and transdermal applications. Dosage treatment may be a single dose 
1 5 schedule or a multiple dose schedule. The vaccine may be administered in conjunction with other 
immunoregulatory agents. 

As an alternative to protein-based vaccines, DNA vaccination may be employed [eg. Robinson & 
Torres (1997) Seminars in Immunology 9:271-283; Donnelly et al. (1997) Annu Rev Immunol 
15:617-648; see later herein]. 

20 Gene Delivery Vehicles 

Gene therapy vehicles for delivery of constructs including a coding sequence of a therapeutic of 
the invention, to be delivered to the mammal for expression in the mammal, can be administered 
either locally or systemically. These constructs can utilize viral or non-viral vector approaches in 
in vivo or ex vivo modality. Expression of such coding sequence can be induced using endogenous 

25 mammalian or heterologous promoters. Expression of the coding sequence in vivo can be either 
constitutive or regulated. 

The invention includes gene delivery vehicles capable of expressing the contemplated nucleic acid 
sequences. The gene delivery vehicle is preferably a viral vector and, more preferably, a retroviral, 
adenoviral, adeno-associated viral (AAV), herpes viral, or alphavirus vector. The viral vector can 
30 also be an astrovirus, coronavirus, orthomyxovirus, papovavirus, paramyxovirus, parvovirus, 
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picoraavirus, poxvirus, or togavirus viral vector. See generally, Jolly (1994) Cancer Gene Therapy 
1:51-64; Kimura (1994) Human Gene Therapy 5:845-852; Connelly (1995) Human Gene Therapy 
6:185-193; and Kaplitt (1994) Nature Genetics 6:148-153. 

Retroviral vectors are well known in the art and we contemplate that any retroviral gene therapy vector 
5 is employable in the invention, including B, C and D type retroviruses, xenotropic retroviruses (for 
example, NZB-X1,NZB-X2 and NZB9-1 (see O'Neill (1985) J. Virol 53:160) polytropic retroviruses 
eg. MCF and MCF-MLV (see Kelly (1983) J, Virol 45:291), spumaviruses and lentivimses. See RNA 
Tumor Viruses, Second Edition, Cold Spring Harbor Laboratory, 1985. 

Portions of the retroviral gene therapy vector may be derived from different retroviruses. For 
10 example, retrovector LTRs may be derived from a Murine Sarcoma Virus, a tRNA binding site 
from a Rous Sarcoma Virus, a packaging signal from a Murine Leukemia Virus, and an origin of 
second strand synthesis from an Avian Leukosis Virus. 

These recombinant retroviral vectors may be used to generate transduction competent retroviral 
vector particles by introducing them into appropriate packaging cell lines (see US patent 
15 5,591,624). Retrovirus vectors can be constructed for site-specific integration into host cell DNA 
by incorporation of a chimeric integrase enzyme into the retroviral particle (see W096/37626). It 
is preferable that the recombinant viral vector is a replication defective recombinant virus. 

Packaging cell lines suitable for use with the above-described retrovirus vectors are well known 
in the art, are readily prepared (see WO95/30763 and WO92/05266), and can be used to create 
20 producer cell lines (also termed vector cell lines or "VCLs") for the production of recombinant 
vector particles. Preferably, the packaging cell lines are made from human parent cells (eg. HT1080 
cells) or mink parent cell lines, which eliminates inactivation in human serum. 

Preferred retroviruses for the construction of retroviral gene therapy vectors include Avian 
Leukosis Virus, Bovine Leukemia, Virus, Murine Leukemia Virus, Mink-Cell Focus-Inducing 
25 Virus, Murine Sarcoma Virus, Reticuloendotheliosis Virus and Rous Sarcoma Virus. Particularly 
preferred Murine Leukemia Viruses include 4070A and 1504A (Hartley and Rowe (1976) J Virol 
19:19-25), Abelson (ATCC No. VR-999), Friend (ATCC No. VR-245), Graffi, Gross (ATCC Nol 
VR-590), Kirsten, Harvey Sarcoma Virus and Rauscher (ATCC No. VR-998) and Moloney Murine 
Leukemia Virus (ATCC No. VR-190). Such retroviruses may be obtained from depositories or 
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collections such as the American Type Culture Collection ("ATCC") in Rockville, Maryland or 
isolated from known sources using commonly available techniques. 

Exemplary known retroviral gene therapy vectors employable in this invention include those 
described in patent applications GB2200651, EP0415731, EP0345242, EP0334301, WO89/02468; 
5 WO89/05349, WO89/09271, WO90/02806, WO90/07936, WO94/03622, W093/25698, 
W093/25234, WO93/11230, WO93/10218, WO91/02805, WO91/02825, WO95/07994, US 
5,219,740, US 4,405,712, US 4,861,719, US 4,980,289, US 4,777,127, US 5,591,624. See also Vile 
(1993) Cancer Res 53:3860-3864; Vile (1993) Cancer Res 53:962-967; Ram (1993) Cancer Res 
53 (1993) 83-88; Takamiya (1992) J Neurosci Res 33:493-503; Baba (1993) J Neurosurg 
10 79:729-735; Mann (1983) Cell 33:153; Cane (1984) Proc Natl Acad Sci 81:6349; and Miller (1990) 
Human Gene Therapy 1. 

Human adenoviral gene therapy vectors are also known in the art and .employable in this invention. 
See, for example, Berkner (1988) Biotechniques 6:616 and Rosenfeld (1991) Science 252:431, and 
WO93/07283, WO93/06223, and WO93/07282. Exemplary known adenoviral gene therapy vectors 

15 employable in this invention include those described in the above referenced documents and in 
W094/12649, WO93/03769, W093/19191, W094/28938, W095/11984, WO95/00655, 
WO95/27071, W095/29993, W095/34671, WO96/05320, WO94/08026, WO94/11506, 
WO93/06223, W094/24299, WO95/14102, W095/24297, WO95/02697, W094/28152, 
W094/24299, WO95/09241, WO95/25807, WO95/05835, W094/18922 and WO95/09654. 

20 Alternatively, administration of DNA linked to killed adenovirus as described in Curiel (1992) 
Hum. Gene Ther. 3:147-154 may be employed. The gene delivery vehicles of the invention also 
include adenovirus associated virus (AAV) vectors. Leading and preferred examples of such 
vectors for use in this invention are the AAV-2 based vectors disclosed in Srivastava, 
WO93/09239. Most preferred AAV vectors comprise the two AAV inverted terminal repeats in 

25 which the native D-sequences are modified by substitution of nucleotides, such that at least 5 native 
nucleotides and up to 18 native nucleotides, preferably at least 10 native nucleotides up to 18 native 
nucleotides, most preferably 10 native nucleotides are retained and the remaining nucleotides of 
the D-sequence are deleted or replaced with non-native nucleotides. The native D-sequences of the 
AAV inverted terminal repeats are sequences of 20 consecutive nucleotides in each AAV inverted 

30 terminal repeat (/e. there is one sequence at each end) which are not involved in HP formation. The 
non-native replacement nucleotide may be any nucleotide other than the nucleotide found in the 
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native D-sequence in the same position. Other employable exemplary AAV vectors are pWP-19, 
pWN-1, both of which are disclosed in Nahreini (1993) Gene 124:257-262. Another example of 
such an AAV vector is psub201 (see Samulski (1987) J. Virol. 61 :3096). Another exemplary AAV 
vector is the Double-D ITR vector. Construction of the Double-D ITR vector is disclosed in US 
5 Patent 5,478,745. Still other vectors are those disclosed in Carter US Patent 4,797,368 and 
MuzyczkaUS Patent 5,139,941, Chartejee US Patent 5,474,935, and Kotin W094/288157. Yet a 
further example of an AAV vector employable in this invention is SSV9AFABTKneo, which 
contains the AFP enhancer and albumin promoter and directs expression predominantly in the liver. 
Its structure and construction are disclosed in Su (1996) Human Gene Therapy 7:463-470. 
10 Additional AAV gene therapy vectors are described in US 5,354,678, US 5,173,414, US 5,139,941, 
and US 5,252,479. 

The gene therapy vectors of the invention also include herpes vectors. Leading and preferred 
examples are herpes simplex virus vectors containing a sequence encoding a thymidine kinase 
polypeptide such as those disclosed in US 5,288,641 and EP0176170 (Roizman). Additional 
15 exemplary herpes simplex virus vectors include HFEM/ICP6-LacZ disclosed in WO95/04139 
(Wistar Institute), pHSVlac described in Geller (1988) Science 241:1667-1669 and in WO90/09441 
and WO92/07945, HSV Us3::pgC-lacZ described in Fink (1992) Human Gene Therapy 3:11-19 
and HSV 71 34, 2 RH 1 05 and GAL4 described in EP 0453242 (Breakefield), and those deposited 
with the ATCC as accession numbers ATCC VR-977 and ATCC VR-260. 

20 Also contemplated are alpha virus gene therapy vectors that can be employed in this invention. 
Preferred alpha virus vectors are Sindbis viruses vectors. Togaviruses, Semliki Forest virus (ATCC 
VR-67; ATCC VR-1247), Middleberg virus (ATCC VR-370), Ross River virus (ATCC VR-373; 
ATCC VR-1246), Venezuelan equine encephalitis virus (ATCC VR923; ATCC VR-1250; ATCC 
VR-1249; ATCC VR-532), and those described in US patents 5,091,309, 5,217,879, and 

25 WO92/10578. More particularly, those alpha virus vectors described in US Serial No. 08/405,627, 
filed March 15, 1995,W094/21792, WO92/10578, WO95/07994, US 5,091,309 and US 5,217,879 
are employable. Such alpha viruses may be obtained from depositories or collections such as the 
ATCC in Rockville, Maryland or isolated from known sources using commonly available 
techniques. Preferably, alphavirus vectors with reduced cytotoxicity are used (see USSN 

30 08/679640). 
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DNA vector systems such as eukarytic layered expression systems are also useful for expressing 
the nucleic acids of the invention. See WO95/07994 for a detailed description of eukaryotic layered 
expression systems. Preferably, the eukaryotic layered expression systems of the invention are 
derived from alphavirus vectors and most preferably from Sindbis viral vectors. 

5 Other viral vectors suitable for use in the present invention include those derived from poliovirus, for 
example ATCC VR-58 and those described in Evans, Nature 339 (1989) 385 and Sabin (1973)7. Biol. 
Standardization 1:115; rhinovirus, for example ATCC VR-1 1 10 and those described in Arnold (1990) 
J Cell Biochem L401; pox viruses such as canary pox virus or vaccinia virus, for example ATCC 
VR-1 1 1 and ATCC VR-2010 and those described in Fisher-Hoch (1989) Proc Natl Acad Sci 86:317; 

10 Flexner (1989) Ann NY Acad Sci 569:86, Flexner (1990) Vaccine 8:17; in US 4,603,1 12 and US 
4,769,330 and WO89/01973; SV40 virus, for example ATCC VR-305 and those described in 
Mulligan (1979) Nature 277:108 and Madzak (1992) J Gen Virol 73:1533; influenza virus, for 
example ATCC VR-797 and recombinant influenza viruses made employing reverse genetics 
techniques as described in US 5,166,057 and in Enami (1990) Proc Natl Acad Sci 87:3802-3805; 

15 Enami & Palese (1991) J Virol 65:271 1-2713 and Luytjes (1989) Cell 59:1 10, (see also McMichael 
(1983) NEJMed 309:13, and Yap (1978) Nature 273:238 and Nature (1979) 277:108); human 
immunodeficiency virus as described in EP-0386882 and in Buchschacher (1992) J. Virol. 66:273 1 ; 
measles virus, for example ATCC VR-67 and VR-1 247 and those described in EP-0440219; Aura 
virus, for example ATCC VR-368; Bebaru virus, for example ATCC VR-600 and ATCC VR-1240; 

20 Cabassou virus, for example ATCC VR-922; Chikungunya virus, for example ATCC VR-64 and 
ATCC VR-1 241; Fort Morgan Virus, for example ATCC VR-924; Getah virus, for example ATCC 
VR-369 and ATCC VR-1243; Kyzylagach virus, for example ATCC VR-927; Mayaro virus, for 
example ATCC VR-66; Mucambo virus, for example ATCC VR-580 and ATCC VR-1 244; Ndumu 
virus, for example ATCC VR-37 1 ; Pixuna virus, for example ATCC VR-372 and ATCC VR-1245; 

25 Tonate virus, for example ATCC VR-925; Triniti virus, for example ATCC VR-469; Una virus, for 
example ATCC VR-374; Whataroa virus, for example ATCC VR-926; Y-62-33 virus, for example 
ATCC VR-375; CWyong virus, Eastern encephalitis virus, for example ATCC VR-65 and ATCC 
VR-1242; Western encephalitis virus, for example ATCC VR-70, ATCC VR-1251, ATCC VR-622 
and ATCC VR-1252; and coronavirus, for example ATCC VR-740 and those described in Hamre 

30 (1966) Proc Soc Exp Biol Med 121:190. 

Delivery of the compositions of this invention into cells is not limited to the above mentioned viral 
vectors. Other delivery methods and media may be employed such as, for example, nucleic acid 
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expression vectors, polycationic condensed DNA linked or unlinked to killed adenovirus alone, for 
example see US Serial No. 08/366,787, filed December 30, 1994 and Curiel (1992) Hum Gene Ther 
3:147454 ligand linked DNA, for example see Wu (1989) J Biol Chem 264:16985-16987, 
eucaryotic cell delivery vehicles cells, for example see US Serial No.08/240,030, filed May 9, 
5 1994, and US Serial No. 08/404,796, deposition of photopolymerized hydrogel materials, 
hand-held gene transfer particle gun, as described in US Patent 5,149,655, ionizing radiation as 
described in US5,206,152 and in WO92/11033, nucleic charge neutralization or fusion with cell 
membranes. Additional approaches are described in Philip (1994) Mol Cell Biol 14:2411-2418 and 
in Woffendin (1994) Proc Natl Acad Sci 91:1581-1585. 

10 Particle mediated gene transfer may be employed, for example see US Serial No. 60/023,867. 
Briefly, the sequence can be inserted into conventional vectors that contain conventional control 
sequences for high level expression, and then incubated with synthetic gene transfer molecules such 
as polymeric DNA-binding cations like polylysine, protamine, and albumin, linked to cell targeting 
ligands such as asialoorosomucoid, as described in Wu & Wu (1987) J. Biol Chem. 

15 262:4429-4432, insulin as described in Hucked (1990) Biochem Pharmacol 40:253-263, galactose 
as described in Plank (1992) Bioconjugate Chem 3:533-539, lactose or transferrin. 

Naked DNA may also be employed. Exemplary naked DNA introduction methods are described in 
WO 90/11092 and US 5,580,859. Uptake efficiency may be improved using biodegradable latex 
beads. DNA coated latex beads are efficiently transported into cells after endocytosis initiation by the 
20 beads. The method may be improved further by treatment of the beads to increase hydrophobicity and 
thereby facilitate disruption of the endosome and release of the DNA into the cytoplasm. 

Liposomes that can act as gene delivery vehicles are described in US 5,422,120, W095/13796, 
W094/23697, W091/14445 and EP-524,968. As described in USSN. 60/023,867, on non-viral 
delivery, the nucleic acid sequences encoding a polypeptide can be inserted into conventional 

25 vectors that contain conventional control sequences for high level expression, and then be incubated 
with synthetic gene transfer molecules such as polymeric DNA-binding cations like polylysine, 
protamine, and albumin, linked to cell targeting ligands such as asialoorosomucoid, insulin, 
galactose, lactose, or transferrin. Other delivery systems include the use of liposomes to encapsulate 
DNA comprising the gene under the control of a variety of tissue-specific or ubiquitously-active 

30 promoters. Further non-viral delivery suitable for use includes mechanical delivery systems such 
as the approach described in Woffendin et al (1994) Proc. Natl Acad. Sci. USA 
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91(24): 1 1581-1 1585. Moreover, the coding sequence and the product of expression of such can be 
delivered through deposition of photopolymerized hydrogel materials. Other conventional methods 
for gene delivery that can be used for delivery of the coding sequence include, for example, use of 
hand-held gene transfer particle gun, as described in US 5,149,655; use of ionizing radiation for 
5 activating transferred gene, as described in US 5,206,152 and W092/1 1033 

Exemplary liposome and polycationic gene delivery vehicles are those described in US 5,422,120 
and 4,762,915; inWO 95/13796; W094/23697; and W091/14445; in EP-0524968; and in Stryer, 
Biochemistry, pages 236-240 (1975) W.H. Freeman, San Francisco; Szoka (1980) Biochem 
Biophys Acta 600:1 ; Bayer (1979) Biochem Biophys Acta 550:464; Rivnay (1987) Meth Enzymol 
10 149:119; Wang (1987) Proc Natl Acad Sci 84:7851; Plant (1989) Anal Biochem 176:420. 

A polynucleotide composition can comprises therapeutically effective amount of a gene therapy 
vehicle, as the term is defined above. For purposes of the present invention, an effective dose will 
be from about 0.01 mg/ kg to 50 mg/kg or 0.05 mg/kg to about 10 mg/kg of the DNA constructs 
in the individual to which it is administered. 

15 Delivery Methods 

Once formulated, the polynucleotide compositions of the invention can be administered (1) directly 
to the subject; (2) delivered ex vivo, to cells derived from the subject; or (3) in vitro for expression 
of recombinant proteins. The subjects to be treated can be mammals or birds. Also, human subjects 
can be treated. 

20 Direct delivery of the compositions will generally be accomplished by injection, either 
subcutaneously, intraperitoneally, intravenously or intramuscularly or delivered to the interstitial 
space of a tissue. The compositions can also be administered into a lesion. Other modes of 
administration include oral and pulmonary administration, suppositories, and transdermal or 
transcutaneous applications (eg. see WO98/20734), needles, and gene guns or hyposprays. Dosage 

25 treatment may be a single dose schedule or a multiple dose schedule. 

Methods for the ex vivo delivery and reimplantation of transformed cells into a subject are known 
in the art and described in eg. W093/14778. Examples of cells useful in ex vivo applications 
include, for example, stem cells, particularly hematopoetic, lymph cells, macrophages, dendritic 
cells, or tumor cells. 
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Generally, delivery of nucleic acids for both ex vivo and in vitro applications can be accomplished 
by the following procedures, for example, dextran-mediated transfection, calcium phosphate 
precipitation, polybrene mediated transfection, protoplast fusion, electroporation, encapsulation of 
the polynucleotide(s) in liposomes, and direct microinjection of the DNA into nuclei, all well 
5 known in the art. 

Polynucleotide and polypeptide pharmaceutical compositions 

In addition to the pharmaceutical^ acceptable carriers and salts described above, the following 
additional agents can be used with polynucleotide and/or polypeptide compositions. 

A. Polvpeptides 

10 One example are polypeptides which include, without limitation: asioloorosomucoid (ASOR); 
transferrin; asialoglycoproteins; antibodies; antibody fragments; ferritin; interleukins; interferons, 
granulocyte, macrophage colony stimulating factor (GM-CSF), granulocyte colony stimulating 
factor (G-CSF), macrophage colony stimulating factor (M-CSF), stem cell factor and 
erythropoietin. Viral antigens, such as envelope proteins, can also be used. Also, proteins from 

15 other invasive organisms, such as the 17 amino acid peptide from the circumsporozoite protein of 
Plasmodium falciparum known as RII. 

B. Hormones, Vitamins, etc. 

Other groups that can be included are, for example; hormones, steroids, androgens, estrogens, 
thyroid hormone, or vitamins, folic acid. 

20 C.Polvalkvlenes. Polysaccharides, etc. 

Also, polyalkylene glycol can be included with the desired polynucleotides/polypeptides. In a 
preferred embodiment, the polyalkylene glycol is polyethlylene glycol In addition, mono-, di-, or 
polysaccarides can be included. In a preferred embodiment of this aspect, the polysaccharide is 
dextran or DEAE-dextran. Also, chitosan and poly(lactide-co-glycolide) 

25 D.Lipids. and Liposomes 

The desired polynucleotide/polypeptide can also be encapsulated in lipids or packaged in liposomes 
prior to delivery to the subject or to cells derived therefrom. 

Lipid encapsulation is generally accomplished using liposomes which are able to stably bind or 
entrap and retain nucleic acid. The ratio of condensed polynucleotide to lipid preparation can vary 
30 but will generally be around 1:1 (mg DNA:micromoles lipid), or more of lipid. For a review of the 
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use of liposomes as carriers for delivery of nucleic acids, see, Hug and Sleight (1991) Biochim. 
Biophys. Acta. 1097:1-17; Straubinger (1983) Meth. Enzymol 101:512-527. 

Liposomal preparations for use in the present invention include cationic (positively charged), 
anionic (negatively charged) and neutral preparations. Cationic liposomes have been shown to 
5 mediate intracellular delivery of plasmid DNA (Feigner (1987) Proc. Natl. Acad. Set USA 
84:7413-7416); mRNA (Malone (1989) Proc. Natl Acad. Set USA 86:6077-6081); and purified 
transcription factors (Debs (1990) /. Biol Chem. 265:10189-10192), in functional form. 

Cationic liposomes are readily available. For example, N[l-2,3-dioleyloxy)propyl]-N,N,N-triethylammonium 
(DOTMA) liposomes are available under the trademark Lipofectin, from GIBCO BRL, Grand 
10 Island, NY. (See, also, Feigner supra). Other commercially available liposomes include 
transfectace (DDAB/DOPE) and DOTAP/DOPE (Boerhinger). Other cationic liposomes can be 
prepared from readily available materials using techniques well known in the art. See, eg. Szoka 
(1978) Proc. Natl Acad. Sci. USA 75:4194-4198; WO90/1 1092 for a description of the synthesis 
of DOTAP (l,2-bis(oleoyioxy)-3-(trimethylammonio)propane) liposomes. 

15 Similarly, anionic and neutral liposomes are readily available, such as from Avanti Polar Lipids 
(Birmingham, AL), or can be easily prepared using readily available materials. Such materials include 
phosphatidyl choline, cholesterol, phosphatidyl ethanolamine, dioleoylphosphatidyl choline (DOPC), 
dioleoylphosphatidyl glycerol (DOPG), dioleoylphoshatidyl ethanolamine (DOPE), among others. 
These materials can also be mixed with the DOTMA and DOTAP starting materials in appropriate 

20 ratios. Methods for making liposomes using these materials are well known in the art. 

The liposomes can comprise multilammelar vesicles (MLVs), small unilamellar vesicles (SUVs), 
or large unilamellar vesicles (LUVs). The various liposome-nucleic acid complexes are prepared 
using methods known in the art. See eg. Straubinger (1983) Meth. Immunol 101 :512-527; Szoka 
(1978) Proc. Natl Acad. Sci. USA 75:4194-4198; Papahadjopoulos (1975) Biochim. Biophys. Acta 
25 394:483; Wilson (1979) Cell 17:77); Deamer & Bangham (1976) Biochim. Biophys. Acta 443:629; 
Ostro (1977) Biochem. Biophys. Res. Commun. 76:836; Fraley (1979) Proc. Natl Acad. Sci. USA 
76:3348); Enoch & Strittmatter (1979) Proc. Natl Acad. Sci. USA 76:145; Fraley (1980) J. Biol 
Chem. (1980) 255:10431; Szoka & Papahadjopoulos (1978) Proc. Natl Acad. Sci. USA 75:145; 
and Schaefer-Ridder (1982) Science 215:166. 
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E.Lipoproteins 

In addition, lipoproteins can be included with the polynucleotide/polypeptide to be delivered. 
Examples of lipoproteins to be utilized include: chylomicrons, HDL, IDL, LDL, and VLDL. Mutants, 
fragments, or fusions of these proteins can also be used. Also, modifications of naturally occurring 
5 lipoproteins can be used, such as acetylated LDL. These lipoproteins can target the delivery of 
polynucleotides to cells expressing lipoprotein receptors. Preferably, if lipoproteins are including with 
the polynucleotide to be delivered, no other targeting ligand is included in the composition. 

Naturally occurring lipoproteins comprise a lipid and a protein portion. The protein portion are 
known as apoproteins. At the present, apoproteins A, B, C, D, and E have been isolated and 
10 identified. At least two of these contain several proteins, designated by Roman numerals, AI, AH, 
AW; CI, CII, Cffl. 

A lipoprotein can comprise more than one apoprotein. For example, naturally occurring 
chylomicrons comprises of A, B, C, and E, over time these lipoproteins lose A and acquire C and 
E apoproteins. VLDL comprises A, B, C, and E apoproteins, LDL comprises apoprotein B; and 
15 HDL comprises apoproteins A, C, and E. 

The amino acid of these apoproteins are known and are described in, for example, Breslow (1985) 
Annu Rev. Biochem 54:699; Law (1986) Adv. Exp Med. Biol. 151:162; Chen (1986) J Biol Chem 
261:12918; Kane (1980) Proc Natl Acad Sci USA 77:2465; and Utermann (1984) Hum Genet 65:232. 

Lipoproteins contain a variety of lipids including, triglycerides, cholesterol (free and esters), and 
20 phopholipids. The composition of the lipids varies in naturally occurring lipoproteins. For example, 
chylomicrons comprise mainly triglycerides. A more detailed description of the lipid content of 
naturally occurring lipoproteins can be found, for example, in Meth. EnzymoL 128 (1986). The 
composition of the lipids are chosen to aid in conformation of the apoprotein for receptor binding 
activity. The composition of lipids can also be chosen to facilitate hydrophobic interaction and 
25 association with the polynucleotide binding molecule. 

Naturally occurring lipoproteins can be isolated from serum by ultracentrifugation, for instance. 
Such methods are described in Meth. EnzymoL (supra); Pitas (1980) J. Biochem. 255:5454-5460 
and Mahey (1979) J Clin. Invest 64:743-750. Lipoproteins can also be produced by in vitro or 
recombinant methods by expression of the apoprotein genes in a desired host cell. See, for example, 
30 Atkinson (1986) Annu Rev Biophys Chem 15:403 and Radding (1958) Biochim Biophys Acta 30: 
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443. Lipoproteins can also be purchased from commercial suppliers, such as Biomedical 
Technologies, Inc., Stoughton, Massachusetts, USA. Further description of lipoproteins can be 
found in Zuckermann et al PCT/US97/14465. 
F.Polvcationic Agents 

5 Polycationic agents can be included, with or without lipoprotein, in a composition with the desired 
polynucleotide/polypeptide to be delivered. 

Polycationic agents, typically, exhibit a net positive charge at physiological relevant pH and are 
capable of neutralizing the electrical charge of nucleic acids to facilitate delivery to a desired 
location. These agents have both in vitro, ex vivo, and in vivo applications. Polycationic agents can 
10 be used to deliver nucleic acids to a living subject either intramuscularly, subcutaneously, etc. 

The following are examples of useful polypeptides as polycationic agents: polylysine, polyarginine, 
polyornithine, and protamine. Other examples include histones, protamines, human serum albumin, 
DNA binding proteins, non-histone chromosomal proteins, coat proteins from DNA viruses, such 
as (XI 74, transcriptional factors also contain domains that bind DNA and therefore may be useful 
15 as nucleic aid condensing agents. Briefly, transcriptional factors such as C/CEBP, c-jun, c-fos, 
AP-1, AP-2, AP-3, CPF, Prot-1, Sp-1, Oct-1, Oct-2, CREP, and TFIBD contain basic domains that 
bind DNA sequences. 

Organic polycationic agents include: spermine, spermidine, and purtrescine. 

The dimensions and of the physical properties of a polycationic agent can be extrapolated from the 
20 list above, to construct other polypeptide polycationic agents or to produce synthetic polycationic 
agents. 

Synthetic polycationic agents which are useful include, for example, DEAE-dextran, polybrene. 
Lipofectin™, and lipofectAMINE™ are monomers that form polycationic complexes when 
combined with polynucleotides/polypeptides. 

25 Immunodiaznostic Assays 

Neisserial antigens of the invention can be used in immunoassays to detect antibody levels (or, 
conversely, anti-Neisserial antibodies can be used to detect antigen levels). Immunoassays based 
on well defined, recombinant antigens can be developed to replace invasive diagnostics methods. 
Antibodies to Neisserial proteins within biological samples, including for example, blood or serum 
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samples, can be detected. Design of the immunoassays is subject to a great deal of variation, and 
a variety of these are known in the art. Protocols for the immunoassay may be based, for example, 
upon competition, or direct reaction, or sandwich type assays. Protocols may also, for example, use 
solid supports, or may be by immunoprecipitation. Most assays involve the use of labeled antibody 
5 or polypeptide; the labels may be, for example, fluorescent, chemiluminescent, radioactive, or dye 
molecules. Assays which amplify the signals from the probe are also known; examples of which 
are assays which utilize biotin and avidin, and enzyme-labeled and mediated immunoassays, such 
as ELISA assays. 

Kits suitable for immunodiagnosis and containing the appropriate labeled reagents are constructed 
10 by packaging the appropriate materials, including the compositions of the invention, in suitable 
containers, along with the remaining reagents and materials (for example, suitable buffers, salt 
solutions, etc.) required for the conduct of the assay, as well as suitable set of assay instructions. 

Nucleic Acid Hybridisation 

"Hybridization" refers to the association of two nucleic acid sequences to one another by hydrogen 
1 5 bonding. Typically, one sequence will be fixed to a solid support and the other will be free in solution. 
Then, the two sequences will be placed in contact with one another under conditions that favor 
hydrogen bonding. Factors that affect this bonding include: the type and volume of solvent; reaction 
temperature; time of hybridization; agitation; agents to block the non-specific attachment of the liquid 
phase sequence to the solid support (Denhardt's reagent or BLOTTO); concentration of the sequences; 
20 use of compounds to increase the rate of association of sequences (dextran sulfate or polyethylene 
glycol); and the stringency of the washing conditions following hybridization. See Sambrook et al 
[supra] Volume 2, chapter 9, pages 9.47 to 9.57. 

"Stringency" refers to conditions in a hybridization reaction that favor association of very similar 
sequences over sequences that differ. For example, the combination of temperature and salt 
25 concentration should be chosen that is approximately 120 to 200°C below the calculated Tm of the 
hybrid under study. The temperature and salt conditions can often be determined empirically in 
preliminary experiments in which samples of genomic DNA immobilized on filters are hybridized 
to the sequence of interest and then washed under conditions of different stringencies. See 
Sambrook et al at page 9.50. 

30 Variables to consider when performing, for example, a Southern blot are (1) the complexity of the 
DNA being blotted and (2) the homology between the probe and the sequences being detected. The 
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total amount of the fragment(s) to be studied can vary a magnitude of 10, from 0.1 to lug for a 
plasmid or phage digest to 10" 9 to 10" 8 g for a single copy gene in a highly complex eukaryotic 
genome. For lower complexity polynucleotides, substantially shorter blotting, hybridization, and 
exposure times, a smaller amount of starting polynucleotides, and lower specific activity of probes 
5 can be used. For example, a single-copy yeast gene can be detected with an exposure time of only 
1 hour starting with 1 ng of yeast DNA, blotting for two hours, and hybridizing for 4-8 hours with 
a probe of 10 8 cpm/jxg. For a single-copy mammalian gene a conservative approach would start 
with 10 \ig of DNA, blot overnight, and hybridize overnight in the presence of 10% dextran sulfate 
using a probe of greater than 10 8 cpm/jig, resulting in an exposure time of -24 hours. 

1 0 Several factors can affect the melting temperature (Tm) of a DNA-DNA hybrid between the probe 
and the fragment of interest, and consequently, the appropriate conditions for hybridization and 
washing. In many cases the probe is not 100% homologous to the fragment. Other commonly 
encountered variables include the length and total G+C content of the hybridizing sequences and 
the ionic strength and formamide content of the hybridization buffer. The effects of all of these 

1 5 factors can be approximated by a single equation: 

Tm= 81 + 16.6(log 10 Ci) + 0.4[%(G + C)]-0.6(%formamide) - 600/n-1.5(%mismatch). 

where Ci is the salt concentration (monovalent ions) and n is the length of the hybrid in base pairs 
(slightly modified from Meinkoth & Wahl (1984) Anal Biochem. 138: 267-284). 

In designing a hybridization experiment, some factors affecting nucleic acid hybridization can be 
20 conveniently altered. The temperature of the hybridization and washes and the salt concentration 
during the washes are the simplest to adjust. As the temperature of the hybridization increases (ie. 
stringency), it becomes less likely for hybridization to occur between strands that are 
nonhomologous, and as a result, background decreases. If the radiolabeled probe is not completely 
homologous with the immobilized fragment (as is frequently the case in gene family and 
25 interspecies hybridization experiments), the hybridization temperature must be reduced, and 
background will increase. The temperature of the washes affects the intensity of the hybridizing 
band and the degree of background in a similar manner. The stringency of the washes is also 
increased with decreasing salt concentrations. 

In 'general, convenient hybridization temperatures in the presence of 50% formamide are 42°C for 
30 a probe with is 95% to 100% homologous to the target fragment, 37°C for 90% to 95% homology, 
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and 32°C for 85% to 90% homology. For lower homologies, formamide content should be lowered 
and temperature adjusted accordingly, using the equation above. If the homology between the probe 
and the target fragment are not known, the simplest approach is to start with both hybridization and 
wash conditions which are nonstringent. If non-specific bands or high background are observed 
5 after autoradiography, the filter can be washed at high stringency and reexposed. If the time 
required for exposure makes this approach impractical, several hybridization and/or washing 
stringencies should be tested in parallel. 

Nucleic Acid Probe Assays 

Methods such as PCR, branched DNA probe assays, or blotting techniques utilizing nucleic acid 
1 0 probes according to the invention can determine the presence of cDNA or mRNA. A probe is said 
to "hybridize" with a sequence of the invention if it can form a duplex or double stranded complex, 
which is stable enough to be detected. 

The nucleic acid probes will hybridize to the Neisserial nucleotide sequences of the invention 
(including both sense and antisense strands). Though many different nucleotide sequences will 
1 5 encode the amino acid sequence, the native Neisserial sequence is preferred because it is the actual 
sequence present in cells. mRNA represents a coding sequence and so a probe should be 
1 complementary to the coding sequence; single-stranded cDNA is complementary to mRNA, and 
so a cDNA probe should be complementary to the non-coding sequence. 

The probe sequence need not be identical to the Neisserial sequence (or its complement) — some 
20 variation in the sequence and length can lead to increased assay sensitivity if the nucleic acid probe 
can form a duplex with target nucleotides, which can be detected. Also, the nucleic acid probe can 
include additional nucleotides to stabilize the formed duplex. Additional Neisserial sequence may 
also be helpful as a label to detect the formed duplex. For example, a non-complementary 
nucleotide sequence may be attached to the 5' end of the probe, with the remainder of the probe 
25 sequence being complementary to a Neisserial sequence. Alternatively, non-complementary bases 
or longer sequences can be interspersed into the probe, provided that the probe sequence has 
sufficient complementarity with the a Neisserial sequence in order to hybridize therewith and 
thereby form a duplex which can be detected. 

The exact length and sequence of the probe will depend on the hybridization conditions, such as 
30 temperature, salt condition and the like. For example, for diagnostic applications, depending on the 
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complexity of the analyte sequence, the nucleic acid probe typically contains at least 10-20 
nucleotides, preferably ■■ 15-25, and more preferably at least 30 nucleotides, although it may be 
shorter than this. Short primers generally require cooler temperatures to form sufficiently stable 
hybrid complexes with the template. 

5 Probes may be produced by synthetic procedures, such as the triester method of Matteucci et al 
[J. Am. Chem. Soc. (1981) 103:3185], or according to Urdea et al [Proc. Natl. Acad. Set USA 
(1983) 80: 7461], or using commercially available automated oligonucleotide synthesizers. 

The chemical nature of the probe can be selected according to preference. For certain applications, 
DNA or RNA are appropriate. For other applications, modifications may be incorporated eg. 
1 0 backbone modifications, such as phosphorothioates or methylphosphonates, can be used to increase 
in vivo half-life, alter RNA affinity, increase nuclease resistance etc. [eg. see Agrawal & Iyer 
(1995) Curr Opin Biotechnol 6:12-19; Agrawal (1996) TIBTECH 14:376-387]; analogues such as 
peptide nucleic acids may also be used [eg. see Corey (1997) TIBTECH 15:224-229; Buchardt et 
al (1993) TIBTECH 11:384-386]. 

15 Alternatively, the polymerase chain reaction (PCR) is another well-known means for detecting 
small amounts of target nucleic acids. The assay is described in: Mullis et al [Meth. Enzymol. 
(1987) 155: 335-350]; US patents 4,683,195 and 4,683,202. Two "primer" nucleotides hybridize 
with the target nucleic acids and are used to prime the reaction. The primers can comprise sequence 
that does not hybridize to the sequence of the amplification target (or its complement) to aid with 

20 duplex stability or, for example, to incorporate a convenient restriction site. Typically, such 
sequence will flank the desired Neisserial sequence. 

A thermostable polymerase creates copies of target nucleic acids from the primers using the 
original target nucleic acids as a template. After a threshold amount of target nucleic acids are 
generated by the polymerase, they can be detected by more traditional methods, such as Southern 
25 blots. When using the Southern blot method, the labelled probe will hybridize to the Neisserial 
sequence (or its complement). 

Also, mRNA or cDNA can be detected by traditional blotting techniques described in Sambrook 
et al [supra]. mRNA, or cDNA generated from mRNA using a polymerase enzyme, can be purified 
and separated using gel electrophoresis. The nucleic acids on the gel are then blotted onto a solid 
30 support, such as nitrocellulose. The solid support is exposed to a labelled probe and then washed 
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to remove any unhybridized probe. Next, the duplexes containing the labeled probe are detected. 
Typically, the probe is labelled with a radioactive moiety. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figures 1-20 show biochemical data obtained in the Examples, and also sequence analysis, for 
5 ORFs 37, 5, 2, 15, 22, 28, 32, 4, 61, 76, 89, 97, 106, 138, 23, 25, 27, 79, 85 and 132. Ml and M2 
are molecular weight markers. Arrows indicate the position of the main recombinant product or, 
in Western blots, the position of the main K meningitidis immunoreactive band. TP indicates 
N. meningitidis total protein extract; OMV indicates N. meningitidis outer membrane vesicle 
preparation. In bactericidal assay results: a diamond (♦) shows preimmune data; a triangle (A) 

10 shows GST control data; a circle (•) shows data with recombinant N. meningitidis protein. 
Computer analyses show a hydrophilicity plot (upper), an antigenic index plot (middle), and an 
AMPHI analysis (lower). The AMPHI program has been used to predict T-cell epitopes [Gao et 
al (1989) J. Immunol 143:3007; Roberts et al (1996) AIDS Res Hum Retrovir 12:593; Quakyi et 
al (1992) Scand J Immunol suppl.l 1 :9) and is available in the Protean package of DNASTAR, Inc. 

15 (1228 South Park Street, Madison, Wisconsin 53715 USA). 

EXAMPLES 

The examples describe nucleic acid sequences which have been identified in Kmeningitidis, along 
with their putative translation products, and also those of N. gonorrhoeae. Not all of the nucleic acid 
sequences are complete ie. they encode less than the full-length wild-type protein. 

20 The examples are generally in the following format: 

• a nucleotide sequence which has been identified in N. meningitidis (strain B) 

• the putative translation product of this sequence 

• a computer analysis of the translation product based on database comparisons 

• corresponding gene and protein sequences identified in N. meningitidis (strain A) and in 
25 N.gonorrhoeae 

• a description of the characteristics of the proteins which indicates that they might be 
suitably antigenic 

• results of biochemical analysis (expression, purification, ELISA, FACS etc.) 
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The examples typically include details of sequence identity between species and strains. Proteins 
that are similar in sequence are generally similar in both structure and function, and the sequence 
identity often indicates a common evolutionary origin. Comparison with sequences of proteins of 
known function is widely used as a guide for the assignment of putative protein function to a new 
5 sequence and has proved particularly useful in whole-genome analyses. 

Sequence comparisons were performed at NCBI (http://www.ncbi.nlm.nih.gov) using the 
algorithms BLAST, BLAST2, BLASTn, BLASTp, tBLASTn, BLASTx, & tBLASTx [eg see also 
Altschul et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database 
search programs. Nucleic Acids Research 25:2289-3402]. Searches were performed against the 
10 following databases: non-redundant GenBank+EMBL+DDBJ+PDB sequences and non-redundant 
GenBank CDS translations+PDB+SwissProt+SPupdate+PIR sequences. 

To compare Meningococcal and Gonococcal sequences, the tBLASTx algorithm was used, as 
implemented at http://www.genome.ou.edu/gono_blast.html. The FASTA algorithm was also used 
to compare the ORFs (from GCG Wisconsin Package, version 9.0). 

1 5 Dots within nucleotide sequences {eg. position 495 in SEQ ID 1 1) represent nucleotides which have 
been arbitrarily introduced in order to maintain a reading frame. In the same way, double- 
underlined nucleotides were removed. Lower case letters {eg. position 496 in SEQ ID 1 1) represent 
ambiguities which arose during alignment of independent sequencing reactions (some of the 
nucleotide sequences in the examples are derived from combining the results of two or more 

20 experiments). 

Nucleotide sequences were scanned in all six reading frames to predict the presence of hydrophobic 
domains using an algorithm based on the statistical studies of Esposti et al [Critical evaluation of 
the hydropathy of membrane proteins (1990) Eur J Biochem 190:207-219]. These domains 
represent potential transmembrane regions or hydrophobic leader sequences. 

25 Open reading frames were predicted from fragmented nucleotide sequences using the program 
ORFFINDER (NCBI). 

Underlined amino acid sequences indicate possible transmembrane domains or leader sequences 
in the ORFs, as predicted by the PSORT algorithm (http://www.psort.nibb.ac.jp). Functional 
domains were also predicted using the MOTIFS program (GCG Wisconsin & PROSITE). 
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Various tests can be used to assess the in vivo immunogencity of the proteins identified in the 
examples. For example, the proteins can be expressed recombinantly and used to screen patient sera 
by immunoblot. A positive reaction between the protein and patient serum indicates that the patient 
has previously mounted an immune response to the protein in question ie. the protein is an 
5 x immunogen. This method can also be used to identify immunodominant proteins. 

The recombinant protein can also be conveniently used to prepare antibodies eg. in a mouse. These 
can be used for direct confirmation that a protein is located on the cell-surface. Labelled antibody 
(eg. fluorescent labelling for FACS) can be incubated with intact bacteria and the presence of label 
on the bacterial surface confirms the location of the protein. 

10 In particular, the following methods (A) to (S) were used to express, purify and biochemically 
characterise the proteins of the invention: 

A) Chromosomal DNA preparation 

N. meningitidis strain 2996 was grown to exponential phase in 100ml of GC medium, harvested by 
centrifugation, and resuspended in 5ml buffer (20% Sucrose, 50mM Tris-HCl, 50mM EDTA, pH8). 

15 After 10 minutes incubation on ice, the bacteria were lysed by adding 10ml lysis solution (50mM 
NaCl, 1% Na-Sarkosyl, 50)Lig/ml Proteinase K), and the suspension was incubated at 37°C for 2 
hours. Two phenol extractions (equilibrated to pH 8) and one ChCl 3 /isoamylalcohol (24:1) 
extraction were performed. DNA was precipitated by addition of 0.3M sodium acetate and 2 
volumes ethanol, and was collected by centrifugation. The pellet was washed once with 70% 

20 ethanol and redissolved in 4ml buffer (lOmM Tris-HCl, ImM EDTA, pH 8). The DNA 
concentration was measured by reading the OD at 260 nm. 

B) Oligonucleotide design 

Synthetic oligonucleotide primers were designed on the basis of the coding sequence of each ORF, 
using (a) the meningococcus B sequence when available, or (b) the gonococcus/meningococcus A 
25 sequence, adapted to the codon preference usage of meningococcus as necessary. Any predicted 
signal peptides were omitted, by deducing the 5'-end amplification primer sequence immediately 
downstream from the predicted leader sequence. 

For most ORFs, the 5' primers included two restriction enzyme recognition sites (BamHl-Ndel, 
BamHl-Nhel, or EcoRL-Nhel, depending on the gene's own restriction pattern); the 3' primers included 
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a Xhol restriction site. This procedure was established in order to direct the cloning of each 
amplification product (corresponding to each ORF) into two different expression systems: pGEX-KG 
(using either BamHl-Xhol or EcoRl-XhoI), and pET2 1 b+ (using either Ndel-Xhol or Nhel-Xhol). 

5 '-end primer tail: CGC GGATCCCATATG (BamHl-Ndel ) 

5 CGC GGATCCGCTAGC (BamHl-Nhel) 

CCG GAATTC T AGCTAGC (EcoRl-Nhel) 

3 '-end primer tail: CCCG CTCGAG (Xhol) 

For ORFs 5, 15, 17, 19, 20, 22, 27, 28, 65 & 89, two different amplifications were performed to 
clone each ORF in the two expression systems. Two different 5' primers were used for each ORF; 
10 the same 3 ' Xhol primer was used as before: 

5 '-end primer tail: GGAATTC CATATG GCCATGG (Ndel) 

5 '-end primer tail: CG GGATCC (BamHl) 

ORF 76 was cloned in the pTRC expression vector and expressed as an amino-terminus His-tag 
fusion. In this particular case, the predicted signal peptide was included in the final product. Nhel- 
1 5 BamHl restriction sites were incorporated using primers: 

5' -end primer tail: GAT C A G C T AG C C AT AT G (Mel) 

3 '-end primer tail: CG GGATCC (BamHl) 

As well as containing the restriction enzyme recognition sequences, the primers included 
nucleotides which hybridizeed to the sequence to be amplified. The number of hybridizing 
20 nucleotides depended on the melting temperature of the whole primer, and was determined for each 
primer using the formulae: 

T m = 4 (G+C)+ 2 (A+T) (tail excluded) 

T m = 64.9 + 0.41 (% GC) - 600/N (whole primer) 

The average melting temperature of the selected oligos were 65-70°C for the whole oligo and 
25 50-55°C for the hybridising region alone. 

Table I (page 487) shows the forward and reverse primers used for each amplification. In certain 
cases, it will be noted that the sequence of the primer does not exactly match the sequence in the 
ORF. When initial amplifications were performed, the complete 5' and/or 3' sequence was not 
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known for some meningococcal ORFs, although the corresponding sequences had been identified 
in gonococcus. For amplification, the gonococcal sequences could thus be used as the basis for 
primer design, altered to take account of codon preference. In particular, the following codons were 
changed: ATA-»ATT; TCG-»TCT; CAG-»CAA; AAG-»AAA; GAG->GAA; CGA-+CGC; 
5 CGG-»CGC; GGG-+GGC. Italicised nucleotides in Table I indicate such a change. It will be 
appreciated that, once the complete sequence has been identified, this approach is generally no 
longer necessary. 

Oligos were synthesized by a Perkin Elmer 394 DNA/RNA Synthesizer, eluted from the columns 
in 2ml NH 4 OH, and deprotected by 5 hours incubation at 56°C. The oligos were precipitated by 
10 addition of 0.3M Na- Acetate and 2 volumes ethanol. The samples were then centrifuged and the 
pellets resuspended in either 100\x\ or 1ml of water. OD 260 was determined using a Perkin Elmer 
Lambda Bio spectophotometer and the concentration was determined and adjusted to 2-10pmol/|al. 

C) Amplification 

The standard PCR protocol was as follows: 50-200ng of genomic DNA were used as a template 
15 in the presence of 20-40^iM of each oligo, 400-800|uM dNTPs solution, lx PCR buffer (including 
1.5mM MgCl 2 ), 2.5 units TaqI DNA polymerase (using Perkin-Elmer AmpliTaQ, GIBCO 
Platinum, Pwo DNA polymerase, or Tahara Shuzo Taq polymerase). 

In some cases, PCR was optimsed by the addition of \0\x\ DMSO or 50jil 2M betaine. 

After a hot start (adding the polymerase during a preliminary 3 minute incubation of the whole mix 
20 at 95°C), each sample underwent a double-step amplification: the first 5 cycles were performed 
using as the hybridization temperature the one of the oligos excluding the restriction enzymes tail, 
followed by 30 cycles performed according to the hybridization temperature of the whole length 
oligos. The cycles were followed by a final 10 minute extension step at 72°C. 

The standard cycles were as follows: 





Denaturation 


Hybridisation 


Elongation 


First 5 cycles 


30 seconds 
95°C 


30 seconds 
50-55°C 


30-60 seconds 
72°C 


Last 30 cycles 


30 seconds 


30 seconds 


30-60 seconds 
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95°C 


65-70°C 


72°C 



The elongation time varied according to the length of the ORF to be amplified. 

The amplifications were performed using either a 9600 or a 2400 Perkin Elmer GeneAmp PCR 
System. To check the results, 1/10 of the amplification volume was loaded onto a 1-1.5% agarose 
gel and the size of each amplified fragment compared with a DNA molecular weight marker. 

5 The amplified DNA was either loaded directly on a 1 % agarose gel or first precipitated with ethanol 
and resuspended in a suitable volume to be loaded on a 1% agarose gel. The DNA fragment 
corresponding to the right size band was then eluted and purified from gel, using the Qiagen Gel 
Extraction Kit, following the instructions of the manufacturer. The final volume of the DNA 
fragment was 30[d or 50^1 of either water or lOmM Tris, pH 8.5. 

1 0 D) Digestion of PCR fragments 

The purified DNA corresponding to the amplified fragment was split into 2 aliquots and double- 
digested with: 

- NdeVXhol or NheUXhol for cloning into pET-21b+ and further expression of the protein 
as a C-terminus His-tag fusion 

1 5 - BamHI/XhoI or EcoRI/XhoI for cloning into pGEX-KG and further expression of the 

protein as N-terminus GST fusion. 

- For ORF 76, NhellBamHl for cloning into pTRC-HisA vector and further expression 
of the protein as N-terminus His-tag fusion. 

- EcoRI/Pstl, EcoRI/Sall, Sall/PstI for cloning into pGex-His and further expression of 
20 the protein as N-terminus His-tag fusion 

Each purified DNA fragment was incubated (37°C for 3 hours to overnight) with 20 units of each 
restriction enzyme (New England Biolabs ) in a either 30 or 40|il final volume in the presence of 
the appropriate buffer. The digestion product was then purified using the QIAquick PCR 
purification kit, following the manufacturer's instructions, and eluted in a final volume of 30 or 
25 50|il of either water or lOmM Tris-HCl, pH 8.5. The final DNA concentration was determined by 
1% agarose gel electrophoresis in the presence of titrated molecular weight marker. 
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E) Digestion of the cloning vectors (pET22B, pGEX-KG, pTRC-His A, and pGex-His) 

lOfig plasmid was double-digested with 50 units of each restriction enzyme in 200jxl reaction 
volume in the presence of appropriate buffer by overnight incubation at 37°C. After loading the 
whole digestion on a 1% agarose gel, the band corresponding to the digested vector was purified 
5 from the gel using the Qiagen QIAquick Gel Extraction Kit and the DNA was eluted in 50^1 of 
lOmM Tris-HCl, pH 8.5. The DNA concentration was evaluated by measuring OD 260 of the sample, 
and adjusted to 50fxg/|il. l\x\ of plasmid was used for each cloning procedure. 

The vector pGEX-His is a modified pGEX-2T vector carrying a region encoding six histidine 
residues upstream to the thrombin cleavage site and containing the multiple cloning site of the 
10 vector pTRC99 (Pharmacia). 

F) Cloning 

The fragments corresponding to each ORF, previously digested and purified, were ligated in both pET22b 
and pGEX-KG. In a final volume of 20^1, a molar ratio of 3:1 fragment/vector was ligated using 0.5|il 
of NEB T4 DNA ligase (400 units/^il), in the presence of the buffer supplied by the manufacturer. 
15 The reaction was incubated at room temperature for 3 hours. In some experiments, ligation was 
performed using the Boheringer "Rapid Ligation Kit", following the manufacturer's instructions. 

In order to introduce the recombinant plasmid in a suitable strain, lOOjal E. coli DH5 competent 
cells were incubated with the ligase reaction solution for 40 minutes on ice, then at 37°C for 3 
minutes, then, after adding 800^1 LB broth, again at 37°C for 20 minutes. The cells were then 
20 centrifuged at maximum speed in an Eppendorf microfuge and resuspended in approximately 200|il 
of the supernatant. The suspension was then plated on LB ampicillin (lOOmg/ml ). 

The screening of the recombinant clones was performed by growing 5 randomly-chosen colonies 
overnight at 37°C in either 2ml (pGEX or pTC clones) or 5ml (pET clones) LB broth + 100^g/ml 
ampicillin. The cells were then pelletted and the DNA extracted using the Qiagen QIAprep Spin 
25 Miniprep Kit, following the manufacturer's instructions, to a final volume of 30|j.l. 5|il of each 
individual miniprep (approximately Ig ) were digested with either NdeVXhol or BamHl/Xhol and 
the whole digestion loaded onto a 1-1.5% agarose gel (depending on the expected insert size), in 
parallel with the molecular weight marker (1Kb DNA Ladder, GIBCO). The screening of the 
positive clones was made on the base of the correct insert size. 
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For the cloning of ORFs 110, 111, 113, 115, 119, 122, 125 & 130, the double-digested PCR 
product was ligated into double-digested vector using EcoRl-Pstl cloning sites or, for ORFs 115 
& 127, EcoRVSali or, for ORF 122, SaIL*PstI. After cloning, the recombinant plasmids were 
introduced in the E.coli host W3 1 10. Individual clones were grown overnight at 37°C in L-broth 
5 with 50jil/ml ampicillin. 

G) Expression 

Each ORF cloned into the expression vector was transformed into the strain suitable for expression 
of the recombinant protein product, ljil of each construct was used to transform 30^1 of E.coli 
BL21 (pGEX vector), E.coli TOP 10 (pTRC vector) or E.coli BL21-DE3 (pET vector), as described 

10 above. In the case of the pGEX-His vector, the same E.coli strain (W3110) was used for initial 
cloning and expression. Single recombinant colonies were inoculated into 2ml LB+Amp 
(lOOjig/ml), incubated at 37°C overnight, then diluted 1:30 in 20ml of LB+Amp (100ng/ml) in 
100ml flasks, making sure that the OD^ ranged between 0.1 and 0.15. The flasks were incubated 
at 30°C into gyratory water bath shakers until OD indicated exponential growth suitable for 

15 induction of expression (0.4-0.8 OD for pET and pTRC vectors; 0.8-1 OD for pGEX and pGEX- 
His vectors). For the pET, pTRC and pGEX-His vectors, the protein expression was induced by 
addition of ImM IPTG, whereas in the case of pGEX system the final concentration of IPTG was 
0.2mM. After 3 hours incubation at 30°C, the final concentration of the sample was checked by 
OD. In order to check expression, 1ml of each sample was removed, centrifuged in a microfuge, 

20 the pellet resuspended in PBS, and analysed by 12% SDS-PAGE with Coomassie Blue staining. 
The whole sample was centrifuged at 6000g and the pellet resuspended in PBS for further use. 

H) GST-fusion proteins large-scale purification. 

A single colony was grown overnight at 37°C on LB+Amp agar plate. The bacteria were inoculated 
into 20ml of LB+Amp liquid colture in a water bath shaker and grown overnight. Bacteria were 

25 diluted 1 :30 into 600ml of fresh medium and allowed to grow at the optimal temperature (20-37°C) 
to OD 550 0.8-1. Protein expression was induced with 0.2mM IPTG followed by three hours 
incubation. The culture was centrifuged at 8000rpm at 4°C. The supernatant was discarded and the 
bacterial pellet was resuspended in 7.5ml cold PBS. The cells were disrupted by sonication on ice 
for 30 sec at 40W using a Branson sonifier B-15, frozen and thawed twice and centrifuged again. 

30 The supernatant was collected and mixed with 1 50nl Glutatione-Sepharose 4B resin (Pharmacia) 
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(previously washed with PBS) and incubated at room temperature for 30 minutes. The sample was 
centrifuged at 700g for 5 minutes at 4°C. The resin was washed twice with 10ml cold PBS for 10 
minutes, resuspended in 1ml cold PBS, and loaded on a disposable column. The resin was washed 
twice with 2ml cold PBS until the flow-through reached OD 280 of 0.02-0.06. The GST-fusion 
5 protein was eluted by addition of 700(il cold Glutathione elution buffer (lOmM reduced 
glutathione, 50mM Tris-HCl) and fractions collected until the OD 280 was 0.1. 21jil of each fraction 
were loaded on a 12% SDS gel using either Biorad SDS-PAGE Molecular weight standard broad 
range (Ml) (200, 1 16.25, 97.4, 66.2, 45, 31, 21.5, 14.4, 6.5 kDa) or Amersham Rainbow Marker 
(M2) (220, 66, 46, 30, 21.5, 14.3 kDa) as standards. As the MW of GST is 26kDa, this value must 
1 0 be added to the MW of each GST-fusion protein. 

I) His-fusion solubility analysis (ORFs 111-129) 

To analyse the solubility of the His-fusion expression products, pellets of 3ml cultures were 
resuspended in buffer Ml [500^1 PBS pH 7.2]. 25jil lysozyme (lOmg/ml) was added and the 
bacteria were incubated for 15 min at 4°C. The pellets were sonicated for 30 sec at 40W using a 

15 Branson sonifier B-15, frozen and thawed twice and then separated again into pellet and 
supernatant by a centrifugation step. The supernatant was collected and the pellet was resuspended 
in buffer M2 [8M urea, 0.5M NaCl, 20mM imidazole and 0.1M NaH 2 P0 4 ] and incubated for 3 to 
4 hours at 4°C. After centrifugation, the supernatant was collected and the pellet was resuspended 
in buffer M3 [6M guanidinium-HCl, 0.5M NaCl, 20mM imidazole and 0.1M NaH 2 P0 4 ] overnight 

20 at 4°C. The supernatants from all steps were analysed by SDS-PAGE. 

The proteins expressed from ORFs 113, 119 and 120 were found to be soluble in PBS, whereas 
ORFs 111, 122, 126 and 129 need urea and ORFs 125 and 127 need guanidium-HCl for their 
solubilization. 

J) His-fusion large-scale purification. 

25 A single colony was grown overnight at 37°C on a LB + Amp agar plate. The bacteria were 
inoculated into 20ml of LB+Amp liquid culture and incubated overnight in a water bath shaker. 
Bacteria were diluted 1:30 into 600ml fresh medium and allowed to grow at the optimal 
temperature (20-37°C) to OD 550 0.6-0.8. Protein expression was induced by addition of ImM IPTG 
and the culture further incubated for three hours. The culture was centrifuged at 8000rpm at 4°C, 

30 the supernatant was discarded and the bacterial pellet was resuspended in 7.5ml of either (i) cold 
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buffer A (300mM NaCl, 50mM phosphate buffer, lOmM imidazole, pH 8) for soluble proteins or 
(ii) buffer B (urea 8M, lOmM Tris-HCl, lOOmM phosphate buffer, pH 8.8) for insoluble proteins. 

The cells were disrupted by sonication on ice for 30 sec at 40W using a Branson sonifier B-15, 
frozen and thawed two times and centrifuged again. 

5 For insoluble proteins, the supernatant was stored at -20°C, while the pellets were resuspended in 2ml 
buffer C (6M guanidine hydrochloride, lOOmM phosphate buffer, lOmM Tris-HCl, pH 7.5) and 
treated in a homogenizer for 10 cycles. The product was centrifuged at 13000rpm for 40 minutes. 

Supernatants were collected and mixed with 150pl Ni 2+ -resin (Pharmacia) (previously washed with 
either buffer A or buffer B, as appropriate) and incubated at room temperature with gentle agitation 
10 for 30 minutes. The sample was centrifuged at 700g for 5 minutes at 4°C. The resin was washed 
twice with 10ml buffer A or B for 10 minutes, resuspended in 1ml buffer A or B and loaded on a 
disposable column. The resin was washed at either (i) 4°C with 2ml cold buffer A or (ii) room 
temperature with 2ml buffer B, until the flow-through reached OD 280 of 0.02-0.06. 

The resin was washed with either (i) 2ml cold 20mM imidazole buffer (300mM NaCl, 50mM 
15 phosphate buffer, 20mM imidazole, pH 8) or (ii) buffer D (urea 8M, lOmM Tris-HCl, lOOmM 
phosphate buffer, pH 6.3) until the flow-through reached the O.D 280 of 0.02-0.06. The His-fusion 
protein was eluted by addition of 700|il of either (i) cold elution buffer A (300mM NaCl, 50mM 
phosphate buffer, 250mM imidazole, pH 8) or (ii) elution buffer B (urea 8M, lOmM Tris-HCl, 
lOOmM phosphate buffer, pH 4.5) and fractions collected until the O.D 280 was 0.1. 21|il of each 
20 fraction were loaded on a 12% SDS gel. 

K) His-fusion proteins renaturation 

10% glycerol was added to the denatured proteins. The proteins were then diluted to 20(xg/ml using 
dialysis buffer I (10% glycerol, 0.5M arginine, 50mM phosphate buffer, 5mM reduced glutathione, 
0.5mM oxidised glutathione, 2M urea, pH 8.8) and dialysed against the same buffer at 4°C for 12- 
25 14 hours. The protein was further dialysed against dialysis buffer II (10% glycerol, 0.5M arginine, 
50mM phosphate buffer, 5mM reduced glutathione, 0.5mM oxidised glutathione, pH 8.8) for 12-14 
hours at 4°C. Protein concentration was evaluated using the formula: 

Protein (mg/ml) = (1.55 x OD 280 ) - (0.76 x OD 260 ) 
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L) His-fusion large-scale purification (ORFs 111-129) 

500ml of bacterial cultures were induced and the fusion proteins were obtained soluble in buffer 
Ml, M2 or M3 using the procedure described above. The crude extract of the bacteria was loaded 
onto a Ni-NTA superflow column (Quiagen) equilibrated with buffer Ml, M2 or M3 depending 
5 on the solubilization buffer of the fusion proteins. Unbound material was eluted by washing the 
column with the same buffer. The specific protein was eluted with the corresponding buffer 
containing 500mM imidazole and dialysed against the corresponding buffer without imidazole. 
After each run the columns were sanitized by washing with at least two column volumes of 0.5 M 
sodium hydroxide and reequilibrated before the next use. 

10 M) Mice immunisations 

20^ig of each purified protein were used to immunise mice intraperitoneally. In the case of ORFs 
2, 4, 15, 22, 27, 28, 37, 76, 89 and 97, Balb-C mice were immunised with Al(OH) 3 as adjuvant on 
days 1,21 and 42, and immune response was monitored in samples taken on day 56. For ORFs 44, 
106 and 132, CD1 mice were immunised using the same protocol. For ORFs 25 and 40, CD1 mice 
15 were immunised using Freund's adjuvant, rather than AL(OH) 3 , and the same immunisation 
protocol was used, except that the immune response was measured on day 42, rather than 56. 
Similarly, for ORFs 23, 32, 38 and 79, CD1 mice were immunised with Freund's adjuvant, but the 
immune response was measured on day 49. 

N) ELISA assay (sera analysis) 

20 The acapsulated MenB M7 strain was plated on chocolate agar plates and incubated overnight at 
37°C. Bacterial colonies were collected from the agar plates using a sterile dracon swab and 
inoculated into 7ml of Mueller-Hinton Broth (Difco) containing 0.25% Glucose. Bacterial growth 
was monitored every 30 minutes by following OD 620 . The bacteria were let to grow until the OD 
reached the value of 0.3-0.4. The culture was centrifuged for 10 minutes at lOOOOrpm. The 

25 supernatant was discarded and bacteria were washed once with PBS, resuspended in PBS 
containing 0.025% formaldehyde, and incubated for 2 hours at room temperature and then 
overnight at 4°C with stirring. 100|il bacterial cells were added to each well of a 96 well Greiner 
plate and incubated overnight at 4°C. The wells were then washed three times with PBT washing 
buffer (0.1% Tween-20 in PBS). 200^1 of saturation buffer (2.7% Polyvinylpyrrolidone 10 in 

30 water) was added to each well and the plates incubated for 2 hours at 37°C. Wells were washed 
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three times with PBT. 200jil of diluted sera (Dilution buffer: 1% BSA, 0.1% Tween-20, 0,1% NaN 3 
in PBS) were added to each well and the plates incubated for 90 minutes at 37°C. Wells were 
washed three times with PBT. 100|il of HRP-conjugated rabbit anti-mouse (Dako) serum diluted 
1 :2000 in dilution buffer were added to each well and the plates were incubated for 90 minutes at 
5 37°C. Wells were washed three times with PBT buffer. 1 OO^il of substrate buffer for HRP (25ml 
of citrate buffer pH5, lOmg of O-phenildiamine and 10|xl of H 2 0) were added to each well and the 
plates were left at room temperature for 20 minutes. 100pl H 2 S0 4 was added to each well and OD 490 
was followed. The ELISA was considered positive when OD 490 was 2.5 times the respective 
pre-immune sera. 

10 O) FACScan bacteria Binding Assay procedure. 

The acapsulated MenB M7 strain was plated on chocolate agar plates and incubated overnight at 
37°C. Bacterial colonies were collected from the agar plates using a sterile dracon swab and 
inoculated into 4 tubes containing 8ml each Mueller-Hinton Broth (Difco) containing 0.25% 
glucose. Bacterial growth was monitored every 30 minutes by following OD 620 . The bacteria were 

15 let to grow until the OD reached the value of 0.35-0.5. The culture was centrifuged for 10 minutes 
at 4000rpm. The supernatant was discarded and the pellet was resuspended in blocking buffer (1% 
BSA, 0.4% NaN 3 ) and centrifuged for 5 minutes at 4000rpm. Cells were resuspended in blocking 
buffer to reach OD 620 of 0.07. 100|xl bacterial cells were added to each well of a Costar 96 well 
plate. lOOjal of diluted (1:200) sera (in blocking buffer) were added to each well and plates 

20 incubated for 2 hours at 4°C. Cells were centrifuged for 5 minutes at 4000rpm, the supernatant 
aspirated and cells washed by addition of 200^1/well of blocking buffer in each well. 100jil of R- 
Phicoerytrin conjugated F(ab) 2 goat anti-mouse, diluted 1:100, was added to each well and plates 
incubated for 1 hour at 4°C. Cells were spun down by centrifugation at 4000rpm for 5 minutes and 
washed by addition of 200^il/well of blocking buffer. The supernatant was aspirated and cells 

25 resuspended in 200(il/well of PBS, 0.25% formaldehyde. Samples were transferred to FACScan 
tubes and read. The condition for FACScan setting were: FL1 on, FL2 and FL3 off; FSC-H 
threshold:92; FSC PMT Voltage: E 02; SSC PMT: 474; Amp. Gains 7.1; FL-2 PMT: 539; 
compensation values: 0. 
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P) OMV preparations 

Bacteria were grown overnight on 5 GC plates, harvested with a loop and resuspended in 10 ml 20mM 
Tris-HCl. Heat inactivation was performed at 56°C for 30 minutes and the bacteria disrupted by 
sonication for 10 minutes on ice (50% duty cycle, 50% output). Unbroken cells were removed by 
5 centrifugation at 5000g for 10 minutes and the total cell envelope fraction recovered by centrifugation 
at 50000g at 4°C for 75 minutes. To extract cytoplasmic membrane proteins from the crude outer 
membranes, the whole fraction was resuspended in 2% sarkosyl (Sigma) and incubated at room 
temperature for 20 minutes. The suspension was centrifiiged at lOOOOg for 10 minutes to remove 
aggregates, and the supernatant further ultracentrifiiged at 50000g for 75 minutes to pellet the outer 
10 membranes. The outer membranes were resuspended in lOmM Tris-HCl, pH8 and the protein 
concentration measured by the Bio-Rad Protein assay, using BSA as a standard. 

Q) Whole Extracts preparation 

Bacteria were grown overnight on a GC plate, harvested with a loop and resuspended in 1ml of 
20mM Tris-HCl. Heat inactivation was performed at 56°C for 30 minutes. 

15 R) Western blotting 

Purified proteins (500ng/lane), outer membrane vesicles (5^g) and total cell extracts (25(ig) derived 
from MenB strain 2996 were loaded on 15% SDS-PAGE and transferred to a nitrocellulose 
membrane. The transfer was performed for 2 hours at 150mA at 4°C, in transferring buffer (0.3 % 
Tris base, 1.44 % glycine, 20% methanol). The membrane was saturated by overnight incubation 

20 at 4°C in saturation buffer (10% skimmed milk, 0. 1% Triton X100 in PBS). The membrane was 
washed twice with washing buffer (3% skimmed milk, 0.1% Triton XI 00 in PBS) and incubated 
for 2 hours at 37°C with mice sera diluted 1 :200 in washing buffer. The membrane was washed 
twice and incubated for 90 minutes with a 1 :2000 dilution of horseradish peroxidase labelled anti- 
mouse Ig. The membrane was washed twice with 0.1% Triton XI 00 in PBS and developed with 

25 the Opti-4CN Substrate Kit (Bio-Rad). The reaction was stopped by adding water. 

S) Bactericidal assay 

MC58 strain was grown overnight at 37°C on chocolate agar plates. 5-7 colonies were collected and 
used to inoculate 7ml Mueller-Hinton broth. The suspension was incubated at 37°C on a nutator 
and let to grow until OD 620 was 0.5-0.8. The culture was aliquoted into sterile 1.5ml Eppendorf 
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tubes and centrifuged for 20 minutes at maximum speed in a microfuge. The pellet was washed 
once in Gey's buffer (Gibco) and resuspended in the same buffer to an OD 620 of 0.5, diluted 
1:20000 in Gey's buffer and stored at 25°C. 

50|al of Gey's buffer/1% BSA was added to each well of a 96-well tissue culture plate. 25^1 of 
diluted mice sera (1:100 in Gey's buffer/0.2% BSA) were added to each well and the plate 
incubated at 4°C. 25\i\ of the previously described bacterial suspension were added to each well. 
25|al of either heat-inactivated (56°C waterbath for 30 minutes) or normal baby rabbit complement 
were added to each well. Immediately after the addition of the baby rabbit complement, 22|xl of 
each sample/well were plated on Mueller-Hinton agar plates (time 0). The 96-well plate was 
incubated for 1 hour at 37°C with rotation and then 22|il of each sample/well were plated on 
Mueller-Hinton agar plates (time 1). After overnight incubation the colonies corresponding to time 
0 and time 1 hour were counted. 

Table II (page 493) gives a summary of the cloning, expression and prurification results. 
Example 1 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 1>: 

1 ATGAAACAGA CAGTCAA . AT GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCGACCG GTGTGGNCGG ATGACGTATC GGATTTTCGG GAAAACTTGC 

101 A . GCGGCAGC ACAGGGAAAT GCAGCAGCCC AATACAATTT GGGCGCAATG 

151 TAT . TACAAA GGACGCGCGT GCGCCGGGAT GATGCTGAAG CGGTCAGATG 

201 GTATCGGCAG CCGGCGGAAC AGGGGTTAGC CCAAGCCCAA TACAATTTGG 

251 GCTGGATGTA TGCCAACGGG CGCGC.GTGC GCCAAGATGA TACCGAAGCG 

301 GTCAGATGGT ATCGGCAGGC GGCAGCGCAG GGGGTTGTCC AAGCCCAATA 

351 CAATTTGGGC GTGATATATG CCGAAGGACG TGGAGTGCGC CAAGACGATG 

401 TCGAAGCGGT CAGATGGTTT CGGCAGGCGG CAGCGCAGGG GGTAGCCCAA 

451 GCCCAAAACA ATTTGGGCGT GATGTATGCC GAAAGANCGC GCGTGCGCCA 

501 AGACCG... 

This corresponds to the amino acid sequence <SEQ ED 2; ORF37>: 

1 MKQTVXMLAA ALIALGLNRP VWXDDVSDFR ENLXAAAQGN AAAQYNLGAM 

51 YXQRTRVRRD DAEAVRWYRQ PAEQGLAQAQ YNLGWMYANG RXVRQDDTEA 

101 VRWYRQAAAQ GWQAQYNLG VIYAEGRGVR QDDVEAVRWF RQAAAQGVAQ 

151 AQNNLGVMYA ERXRVRQD. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 3>: 

1 ATGAAACAGA CAGTCAAATG GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCGAGCG GTGTGGGCGG ATGACGTATC GGATTTTCGG GAAAACTTGC 

101 AGGCGGCAGC ACAGGGAAAT GCAGCAGCCC AATACAATTT GGGCGCAATG 

151 TATTACAAAG GACGCGGCGT GCGCCGGGAT GATGCTGAAG CGGTCAGATG 

201 GTATCGGCAG GCGGCGGAAC AGGGGTTAGC CCAAGCCCAA TACAATTTGG 

251 GCTGGATGTA TGCCAACGGG CGCGGCGTGC GCCAAGATGA TACCGAAGCG 

301 GTCAGATGGT ATCGGCAGGC GGCAGCGCAG GGGGTTGTCC AAGCCCAATA 

351 CAATTTGGGC GTGATATATG CCGAAGGACG TGGAGTGCGC CAAGACGATG 

401 TCGAAGCGGT CAGATGGTTT CGGCAGGCGG CAGCGCAGGG GGTAGCCCAA 

451 GCCCAAAACA ATTTGGGCGT GATGTATGCC GAAAGACGCG GCGTGCGCCA 

501 AGACCGCGCC CTTGCACAAG AATGGTTTGG CAAGGCTTGT CAAAACGGAG 

551 ACCAAGACGG CTGCGACAAT GACCAACGCC TGAAGGCGGG TTATTGA 
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This corresponds to the amino acid sequence <SEQ ID 4; ORF37-l>: 

1 MKQTVKWLAA ALIALGLNRA VWA DDVSDFR ENLQAAAQGN AAAQYNLGAM 

51 YYKGRGVRRD DAEAVRWYRQ AAEQGLAQAQ YNLGWMYANG RGVRQDDTEA 

101 VRWYRQAAAQ GWQAQYNLG VIYAEGRGVR QDDVEAVRWF RQAAAQGVAQ 

5 151 AQNNLGVMYA ERRGVRQDRA LAQEWFGKAC QNGDQDGCDN DQRLKAGY* 

Further work identified the corresponding gene in strain A of N. meningitidis <SEQ ID 5>: 

1 ATGAAACAGA CAGTCAAATG GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCAAGCG GTGTGGGCGG ATGACGTATC GGATTTTCGG GAAAACTTGC 

101 AGGCGGCAGC ACAGGGAAAT GCAGCAGCCC AAAACAATTT GGGCGTGATG 

10 151 TATGCCGAAA GACGCGGCGT GCGCCAAGAC CGCGCCCTTG CACAAGAATG 

201 GCTTGGCAAG GCTTGTCAAA ACGGATACCA AGACAGCTGC GACAATGACC 

251 AACGCCTGAA AGCGGGTTAT TGA 

This encodes a protein having amino acid sequence <SEQ ID 6; ORF37a>: 

1 MKQTVKWLAA ALIALGLNQA VWA DDVSDFR ENLQAAAQGN AAAQNNLGVM 
15 51 YAERRGVRQD RALAQEWLGK ACQNGYQDSC DN DQRLKAGY * 



The originally-identified partial strain B sequence (ORF37) shows 68.0% identity over a 75aa 
overlap with ORF37a: 

10 20 30 40 50 60 

20 orf 37 . pep MKQTVXMLAAALIALGLNRPVWX DDVSDFRENLXAAAQGNAAAQYNLGAMYXQRTRVRRD 

I I I I I I I I I I I I I I I I : I I I I I I I I I I 1 I I I I I I I I I I I I I I : I I : I I I : I 
orf 37a MKQTVKWLAAALIALGLNQAVWA DDVSDFRENLQAAAQGNAAAQNNLGVMYAERRGVRQD 

10 20 30 40 50 60 

25 70 80 90 100 110 120 

orf 37 . pep DAEAVRWYRQPAEQGLAQAQYNLGWMYANGRXVRQDDTEAVRWYRQAAAQGWQAQYNLG 

I I : I : I 
or f 3 7 a RALAQEWLGKACQNGYQDSCDNDQRLKAGYX 

70 80 90 

30 Further work identified the corresponding gene in N. gonorrhoeae <SEQ ID 7 >: 

1 ATGAAACAGA CAGTCAAATG GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCAAGCG GTGTGGGCGG GTGACGTATC GGATTTTCGG GAAAACTTGC 

101 AGgcggcaGA ACaggGAAAT GCAGCAGCCC AATTCAATTT GGGCGTGATG 

151 TATGAAAATG GACAAGGAGT TCGTCAAGAT TATGTACAGG CAGTGCAGTG 

35 201 GTATCGCAAG GCTTCAGAAC AAGGGGATGC CCAAGCCCAA TACAATTTGG 

251 GCTTGATGTA TTACGATGGA CGCGGCGTGC GCCAAGACCT TGCGCTCGCT 

301 CAACAATGGC TTGGCAAGGC TTGTCAAAAC GGAGACCAAA ACAGCTGCGA 

351 CAATGACCAA CGCCTGAAGG CGGGTTATTA A 

This encodes a protein having amino acid sequence <SEQ ID 8; ORF37ng>: 

40 1 MKQTVKWLAA ALIALGLNQA VWA GDVSDFR ENLQAAEQGN AAAQFNLGVM 

51 YENGQGVRQD YVQAVQWYRK ASEQGDAQAQ YNLGLMYYDG RGVRQDLALA 
101 QQWLGKACQN GDQNSCDNDQ RLKAGY* 

The originally-identified partial strain B sequence (ORF37) shows 64.9% identity over a 1 1 laa 
overlap with ORF37ng: 

45 orf 37 .pep MKQTVXMLAAAL I ALGLNRPVWXDDVSDFRENLXAAAQGN AAAQYNLGAM YXQRTRVRRD 60 

I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I : I I I : I I : I I : I 
orf37ng MKQTVKWLAAAL I ALGLNQAVW AG DVSDFRENLQAAEQGNAAAQ FN LGVM YENGQGVRQD 60 

or f 37 . pep DAEAVRWYRQPAEQGLAQAQYNLGWMYANGRXVRQDDTEAVRWYRQAAAQGWQAQYNLG 120 
50 : : I I : I I t : : I I I I I I I I I I I I I : I t I I I I : I : I : I : I 

orf37ng YVQAVQWYRKASEQGDAQAQYNLGLMYYDGRGVRQDLALAQQWLGKACQNGDQNSCDNDQ 120 

or f 37 . pep VIYAEGRGVRQDDVEAVRWFRQAAAQGVAQAQNNLGVMYAERXRVRQD 1 68 
55 orf37ng RLKAGY 126 
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The complete strain B sequence (ORF37-1) and ORF37ng show 51.5% identity in 198 aa overlap: 

10 20 30 40 50 60 

orf 37-1 . pep MKQTVKWLAAALIALGLNRAVWADDVSDFRENLQAAAQGNAAAQYNLGAMYYKGRGVRRD 
| | | | | I I I I I I I I I I I I I : I I I I I II I I I I I I I I I 1111111:111:11 : I : I I I : I 
5 o r f 3 7 ng MKQTVKWLAAALI ALGLNQAVWAGDVS DFRENLQAAEQGNAAAQFNLGVMYENGQGVRQD 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 37-1 . pep DAEAVRWYRQAAEQGLAQAQYNLGWMYANGRGVRQDDTEAVRWYRQAAAQGWQAQYNLG 
10 :: I I : I I I : I : I I I I I I I I I I I I I : I I I I I I I 

orf37ng YVQAVQWYRKASEQGDAQAQYNLGLMYYDGRGVRQD 

70 80 90 

130 140 150 160 170 180 

15 orf 37-1 . pep VIYAEGRGVRQDDVEAVRWFRQAAAQGVAQAQNNLGVMYAERRGVRQDRALAQEWFGKAC 

I I I I : I : I I I I 

orf37ng LALAQQWLGKAC 

100 

20 190 199 

orf 37-1 . pep QNGDQDGCDNDQRLKAGYX 
I I I I I : : I I II I I I I I I I I 
orf37ng QNGDQNSCDNDQRLKAGYX 
110 120 

25 Computer analysis of these amino acid sequences indicates a putative leader sequence, and it was 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF37-1 (llkDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
30 1A shows the results of affinity purification of the GST-fusion protein, and Figure IB shows the 
results of expression of the His-fusion in E.colL Purified GST-fusion protein was used to immunise 
mice, whose sera were used for ELISA (positive result), FACS analysis (Figure 1C), and a 
bactericidal assay (Figure ID). These experiments confirm that ORF37-1 is a surface-exposed 
protein, and that it is a useful immunogen. 

35 Figure IE shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF37-1. 
Example 2 

The following partial DNA sequence was identified in K meningitidis <SEQ ID 9>: 

TTCGGCGA CATCGGCGGT TTGAAGGTCA ATGCCCCCGT CAAATCCGCA 

GGCGTATTGG TCGGGCGCGT CGGCGCTATC GGACTTGACC CGAAATCCTA 

40 TCAGGCGAGG GTGCGCCTCG ATTTGGACGG CAAGTATCAG TTCAGCAGCG 

ACGTTTCCGC GCAAATCCTG ACTTCsGGAC TTTTGGGCGA GCAGTACATC 

GGGCTGCAGC AGGGCGGCGA CACGGAAAAC CTTGCTGCCG GCGACACCAT 

CTCCGTAACC AGTTCTGCAA TGGTTCTGGA AAACCTTATC GGCAAATTCA 

TGACGAGTTT TGCCGAGAAA AATGCCGACG GCGGCAATGC GGAAAAAGCC 
45 GCCGAATAA 

This corresponds to the amino acid sequence <SEQ ID 10>: 

1 FGDIGGLKVN APVKSAGVLV GRVGAIGLDP KSYQARVRLD LDGKYQFSSD 
51 VSAQILTSGL LGEQYIGLQQ GGDTENLAAG DTISVTSSAM VLENLIGKFM 
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101 TSFAEKNADG GNAEKAAE* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a hypothetical Kinfluenzae protein (vbrd.haein: accession number p45029) 
SEQ ID 9 and ybrd.haein show 48.4% aa identity in 122 aa overlap: 

20 30 40 50 60 70 

yrbd. h LGIGALVFLGLRVANVQGFAETKSYTVTATFDNIGGLKVRAPLKIGGWIGRVSAITLDE 

I:: M I I I I: I I: I : I I : : I I I : I I : I I 
N . m FGD IGGLKVNAPVKS AGVLVGRVGAI GLD P 

10 20 30 



10 



80 90 100 110 120 130 

yrbd.h KSYLPKVSIAINQEYNEIPENSSLSIKTSGLLGEQYIALTMGFDDGDTAMLKNGSQIQDT 
Ml : I : : : : : I I I I I I I I I I I I : I I I I I : I : I : I I 

N.m KSYQARVRLDLDGKY-QFSSDVSAQILTSGLLGEQYIGLQQG GDTENLAAGDT I SVT 

15 40 50 60 70 80 

140 150 160 

yrbd. h TSAMVLEDLIGQFL — YGSKKSDGNEKSESTEQ 
:||tll|:|||:|: :::|::||:: ::::|: 
20 N.m S S AMVLENL IGKFMT S FAEKNADGGNAEKAAEX 

90 100 110 120 

Homology with a predicted ORF from N. gonorrhoeae 

SEQ ID 9 shows 99.2% identity over a 1 18aa overlap with a predicted ORF from N. gonorrhoeae: 

25 20 30 40 50 60 70 

yrbd GAAAVAFLAFRVAGGAAFGGSDKTYAVYADFGDIGGLKVNAPVKSAGVLVGRVGAIGLDP 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
N m FGDIGGLKVNAPVKSAGVLVGRVGAIGLDP 

10 20 30 

30 

80 90 100 110 120 130 

yrbd KSYQARVRLDLDGKYQFSSDVSAQILTSGLLGEQYIGLQQGGDTENLAAGDTISVTSSAM 

I I | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
N.m KSYQARVRLDLDGKYQFSSDVSAQILTSGLLGEQYIGLQQGGDTENLAAGDTISVTSSAM 
35 40 50 60 70 80 90 

140 150 160 

yrbd VLENLIGKFMTSFAEKNAEGGNAEKAAEX 
I I I I I I I I I I I I I I I I I I : I I I I I II I I I 
40 N.m VLENLIGKFMTS FAEKNADGGNAEKAAEX 

100 110 120 

The complete yrbd Kinfluenzae sequence has a leader sequence and it is expected that the full- 
length homologous N. meningitidis protein will also have one. This suggests that it is either a 
membrane protein, a secreted protein, or a surface protein and that the protein, or one of its 
45 epitopes, could be a useful antigen for vaccines or diagnostics. 



Example 3 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 1 1>: 

1 ..ATTTTGATAT ACCTCATCCG CAAGAATCTA GGTTCGCCCG TCTTCTTCTT 

51 TCAGGAACGC CCCGGAAAGG ACGGAAAACC TTTTAAAATG GTCAAATTCC 

50 101 GTTCCATGCG CGACGGCTTG TATTCAGACG GCATTCCGCT GCCCGACGGA 

151 GAACGCCTGA CACCGTTCGG CAAAAAACTG CGTGCCGcCA GTwTGGACGA 

201 ACTGCCTGAA TTATGGAATA TCTTAAAAGG CGAGATGAGC CTGGTCGGCC 

251 CCCGCCCGCT GCTGATGCAA TATCTGCCGC TGTACGACAA CTTCCAAAAC 

301' CGCCGCCACG AAATGAAACC CGGCATTACC GGCTGGGCGC AGGTCAACGG 
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4 

351 GCGCAACGCg CTTTCGTGGG ACGAAAAATT CGCCTGCGAT GTTTGGTATA 

401 TCGACCACTT CAGCCTGTGC CTCGACATCA AAATCCTACT GCTGACGGTT 

451 AAAAAAGTAT TAATCAAGGA AGGGATTTCC GCACAGGGCG AACA.aCCAT 

501 GCCCCCTTTC ACAGGAAAAC GCAAACTCGC CGTCGTCGGT GCGGGCGGAC 

551 ACGGAAAAGT CGTTGCCGAC CTTGCCGCCG CACTCGGCCG GTACAGGGAA 

601 ATCGTTTTTC TGGACGACCG CGCACAAGGC AGCGTCAACG GCTTTTCCGT 

651 CATCGGCACG ACGCTGCTGC TTGAAAACAG TTTATCGCCC GAACAATACG 

701 ACGTCGCCGT CGCCGTCGGC AACAACCGCA TCCGCCGCCA AATCGCCGAA 

751 AAAGCCGCCG CGCTCGGCTT CGCCCTGCCC GTACTGGTTC ATCCGGACGC 

801 GACCGTCTCG CCTTCTGCAA CAGTCGGACA AGGCAGCGTC GTTATGGCGA 

851 AAGCGGTCG. . 

This corresponds to the amino acid sequence <SEQ ID 12; ORF3>: 



1 . . ILIYLI RKNL GSPVFFFQER PGKDGKPFKM VKFRSMRDGL YSDGIPLPDG 

51 ERLTPFGKKL RAASXDELPE LWNILKGEMS LVGPRPLLMQ YLPLYDNFQN 

101 RRHEMKPGIT GWAQVNGRNA LSWDEKFACD VWYIDHFSLC LDIKILLLTV 

151 KKVLIKEGIS AQGEXTMPPF TGKRKLAWG AGGHGKWAD LAAALGRYRE 

201 IVFLDDRAQG SVNGFSVIGT TLLLENSLSP EQYDVAVAVG NNRIRRQIAE 

251 KAAALGFALP VLVHPDATVS PSATVGQGSV VMAKAV . . 

Further sequence analysis revealed the complete nucleotide sequence <SEQ ID 13>: 



1 ATGAGTAAAT TCTTCAAACG CCTGTTTGAC ATTGTTGCCT CCGCCTCGGG 

51 ACTGATTTTC CTCTCGCCAG TATTTTTGAT TTTGATATAC CTCATCCGCA 

101 AGAATCTAGG TTCGCCCGTC TTCTTCTTTC AGGAACGCCC CGGAAAGGAC 

151 GGAAAACCTT TTAAAATGGT CAAATTCCGT TCCATGCGCG ACGCGCTTGA 

201 TTCAGACGGC ATTCCGCTGC CCGACGGAGA ACGCCTGACA CCGTTCGGCA 

251 AAAAACTGCG TGCCGCCAGT TTGGACGAAC TGCCTGAATT ATGGAATATC 

301 TTAAAAGGCG AGATGAGCCT GGTCGGCCCC CGCCCGCTGC TGATGCAATA 

351 TCTGCCGCTG TACGACAACT TCCAAAACCG CCGCCACGAA ATGAAACCCG 

401 GCATTACCGG CTGGGCGCAG GTCAACGGGC GCAACGCGCT TTCGTGGGAC 

451 GAAAAATTCG CCTGCGATGT TTGGTATATC GACCACTTCA GCCTGTGCCT 

501 CGACATCAAA ATCCTACTGC TGACGGTTAA AAAAGTATTA ATCAAGGAAG 

551 GGATTTCCGC ACAGGGCGAA GCCACCATGC CCCCTTTCAC AGGAAAACGC 

601 AAACTCGCCG TCGTCGGTGC GGGCGGACAC GGAAAAGTCG TTGCCGACCT 

651 TGCCGCCGCA CTCGGCCGGT ACAGGGAAAT CGTTTTTCTG GACGACCGCG 

701 CACAAGGCAG CGTCAACGGC TTTTCCGTCA TCGGCACGAC GCTGCTGCTT 

751 GAAAACAGTT TATCGCCCGA ACAATACGAC GTCGCCGTCG CCGTCGGCAA 

801 CAACCGCATC CGCCGCCAAA TCGCCGAAAA AGCCGCCGCG CTCGGCTTCG 

851 CCCTGCCCGT TCTGGTTCAT CCGGACGCGA CCGTCTCGCC TTCTGCAACA 

901 GTCGGACAAG GCAGCGTCGT TATGGCGAAA GCCGTCGTAC AGGCAGGCAG 

951 CGTATTGAAA GACGGCGTGA TTGTGAACAC TGCCGCCACC GTCGATCACG 

1001 ACTGCCTGCT TAACGCTTTC GTCCACATCA GCCCAGGCGC GCACCTGTCG 

1051 GGCAACACGC ATATCGGCGA AGAAAGCTGG ATAGGCACGG GCGCGTGCAG 

1101 CCGCCAGCAG ATCCGTATCG GCAGCCGCGC AACCATTGGA GCGGGCGCAG 

1151 TCGTCGTACG CGACGTTTCA GACGGCATGA CCGTCGCGGG CAATCCGGCA 

1201 AAGCCGCTGC CGCGCAAAAA CCCCGAGACC TCGACAGCAT AA 

This corresponds to the amino acid sequence <SEQ ID 14; ORF3-l>: 



1 MSKFFKRLFD IVAS ASGLIF LSPVFLILIY LI RKNLGSPV FFFQERPGKD 

51 GKPFKMVKFR SMRDALDSDG IPLPDGERLT PFGKKLRAAS LDELPELWNI 

101 LKGEMSLVGP RPLLMQYLPL YDNFQNRRHE MKPGITGWAQ VNGRNALSWD 

151 EKFACDVWYI DHFS LCLDIK I LLLTVKKVL I KEGISAQGE ATMPPFTGKR 

201 KLAWGAGGH GKWADLAAA LGRYREIVFL DDRAQGSVNG FSVIGTTLLL 

251 ENSLSPEQYD VAVAVGNNRI RRQIAEKAAA LGFALPVLVH PDATVSPSAT 

301 VGQGSWMAK AWQAGSVLK DGVIVNTAAT VDHDCLLNAF VHISPGAHLS 

351 GNTHIGEESW IGTGACSRQQ IRIGSRATIG AGAVWRDVS DGMTVAGNPA 

401 KPLPRKNPET STA* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF3 shows 93.0% identity over a 286aa overlap with an ORF (ORF3a) from strain A of N. 
meningitidis: 



10 20 30 
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orf 3 . pep I LI YLI RKNLGS PVFFFQERPGKDGKPFKMVKFR 

I I I I I It I I I I I I I I I I I I I I I I I I I I I I I ( I I I 
orf 3a MSKFFKRLFDIVAS ASGLIFLSPVFLILIYLI RKNLGSPVFFFQERPGKDGKPFKMVKFR 

10 20 30 40 50 60 

40 50 60 70 80 90 

orf 3 . pep SMRDGLYSDGIPLPDGERLTPFGKKLRAASXDELPELWNILKGEMSLVGPRPLLMQYLPL 
I I : I : I I I I I 1 I I I I I I I I I I I I I I I I I 1 i I I I I I I : I I I : I I I I I I I I I I I I I I I I 
orf 3a SMHDALDSDGILLPDGERLTPFGKKLRAASLDELPELWNVLKGDMSLVGPRPLLMQYLPL 

70 80 90 100 110 120 

100 110 120 130 140 150 

orf 3 . pep YDNFQNRRHEMKPGITGWAQVNGRNALSWDEKFACDVWYIDHFS LCLDIKILLLTVKKVL 

I I I I I I I i I I I I I I I I I I I I I I I I I II I I I I : I I I I : I I I I 1 I I I I I I I I I 

orf 3a YDNFONRRHEMKPGITGWAQVNGRNALSWDERFACDIWYIDHFS LCLDIKILLLTVKKVL 

130 140 150 160 170 180 

160 170 180 190 200 210 

orf 3 . pep IKEGISAQGEXTMPPFTGKRKLAWGAGGHGKWADLAAALGRYREIVFLDDRAQGSVNG 
i I , .11 1 I 1 I I I I 1 I 1 I I J I I 1 t 1 I 1 I I 1 I r t I I I I I I II I I I I 11:1 I II M 
orf 3a IKEGISAQGEATMPPFTGKRKLAVVGAGGHGKWAELAAALGTYGEIVFLDDRVQGSVNG 

190 200 210 220 230 240 

220 230 240 250 260 270 

orf 3 pep FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVSPSAT 
I I 1 M I I I M I I I I I M ' I : I I ! 1 I I I I I I I I I I 1 I I I I i M I I I I I : I I I : I I M I I! 
orf 3a FPVIGTTLLLENSLSPEQFDIAVAVGNNRIRRQIAEKAAALGFALPVLIHPDSTVSPSAT 

250 260 270 280 290 300 



280 

or f 3 . pep VGQGSWMAKAV 
1111:1111111 

orf 3a VGQGGWMAKAWQADSVLKDGVIVNTAATVDHDCLLDAFVHISPGAHLSGNTRIGEESW 

310 320 330 340 350 360 

The complete length ORF3a nucleotide sequence <SEQ ID 15> is: 

1 ATGAGTAAAT TCTTCAAACG CCTGTTTGAC ATTGTTGCCT CCGCCTCGGG 

51 ACTGATTTTC CTCTCGCCAG TATTTTTGAT TTTGATATAC CTCATCCGCA 

101 AGAATCTGGG TTCGCCCGTC TTCTTCTTTC AGGAACGCCC CGGAAAGGAC 

151 GGAAAACCTT TTAAAATGGT CAAATTCCGT TCCATGCACG ACGCGCTTGA 

201 TTCAGACGGC ATTCTGCTGC CCGACGGAGA ACGCCTGACA CCGTTCGGCA 

251 AAAAACTGCG TGCCGCCAGT TTGGACGAAC TGCCCGAACT GTGGAACGTC 

301 CTCAAAGGCG ACATGAGCCT GGTCGGCCCC CGCCCGCTGC TGATGCAATA 

351 TCTGCCGCTG TACGACAACT TCCAAAACCG CCGCCACGAA ATGAAACCGG 

401 GCATTACCGG CTGGGCGCAG GTCAACGGGC GCAACGCGCT TTCGTGGGAC . 

451 GAACGCTTCG CATGCGACAT CTGGTATATC GACCACTTCA GCCTGTGCCT 

501 CGACATCAAA ATCCTACTGC TGACGGTTAA AAAAGTATTA ATCAAAGAAG 

551 GGATTTCCGC ACAGGGCGAA GCCACCATGC CCCCTTTCAC AGGAAAACGC 

601 AAACTTGCCG TCGTCGGTGC GGGCGGACAC GGCAAAGTCG TTGCCGAGCT 

651 TGCCGCCGCA CTCGGCACAT ACGGCGAAAT CGTTTTTCTG GACGACCGCG 

701 TCCAAGGCAG CGTCAACGGC TTCCCCGTCA TCGGCACGAC GCTGCTGCTT 

751 GAAAACAGTT TATCGCCCGA ACAATTCGAC ATCGCCGTCG CCGTCGGCAA 

801 CAACCGCATC CGCCGCCAAA TCGCCGAAAA AGCCGCCGCG CTCGGCTTCG 

851 CCCTGCCCGT CCTGATTCAT CCGGACTCGA CCGTCTCGCC TTCTGCAACA 

901 GTCGGACAAG GCGGCGTCGT TATGGCGAAA GCCGTCGTAC AGGCTGACAG 

951 CGTATTGAAA GACGGCGTAA TTGTGAACAC TGCCGCCACC GTCGATCACG 

1001 ATTGCCTGCT TGATGCTTTC GTCCACATCA GCCCGGGCGC GCACCTGTCG 

1051 GGCAACACGC GTATCGGCGA AGAAAGCTGG ATAGGCACAG GCGCGTGCAG 

1101 CCGCCAGCAG ATCCGTATCG GCAGCCGCGC AACCATTGGA GCGGGCGCAG 

1151 TCGTCGTGCG CGACGTTTCA GACGGCATGA CCGTCGCGGG CAACCCGGCA 

1201 AAACCATTGG CAGGCAAAAA TACCGAGACC CTGCGGTCGT AA 

This is predicted to encode a protein having amino acid sequence <SEQ ID 16>: 



1 MSKFFKRLFD IVAS ASGLIF LSPVFLILIY LI RKNLGSPV FFFQERPGKD 

51 GKPFKMVKFR SMHDALDSDG ILLPDGERLT PFGKKLRAAS LDELPELWNV 

101 LKGDMSLVGP RPLLMQYLPL YDNFQNRRHE MKPGITGWAQ VNGRNALSWD 

151 ERFACDIWYI DHFS LCLDIK ILLLTVKKVL I KEGISAQGE ATMPPFTGKR 

201 KLAWGAGGH GKWAELAAA LGTYGEIVFL DDRVQGSVNG FPVIGTTLLL 

251 ENSLSPEQFD IAVAVGNNRI RRQIAEKAAA LGFALPVLIH PDSTVSPSAT 



WO 99/24578 PCT/IB98/01665 

-66- 

301 VGQGGWMAK AWQADSVLK DGVIVNTAAT VDHDCLLDAF VHISPGAHLS 
351 GNTRIGEESW IGTGACSRQQ IRIGSRATIG AGAVWRDVS DGMTVAGNPA 
401 KPLAGKNTET LRS* 

Two transmembrane domains are underlined. 



5 ORF3-1 shows 94.6% identity in 410 aa overlap with ORF3a: 

10 20 30 40 50 60 

MSKFFKRLFDIVASASGLIFLSPVFLILIYLIRKNLGSPVFFFQERPGKDGKPFKMVKFR 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I 
MSKFFKRLFDIVASASGLIFLSPVFLILIYLIRKNLGSPVFFFQERPGKDGKPFKMVKFR 
10 20 30 40 50 60 



10 



orf3a.pep 
orf3-l 



70 80 90 100 110 120 

orf 3a . pep SMHDALDSDGILLPDGERLTPFGKKLRAASLDELPELWNVLKGDMSLVGPRPLLMQYLPL 
I I : I I I I I I I I M 1 I I I I I I I I I I I 1 I I I I I I I I I I I I : j I I : I f I I I i I I I I I I I I I I 
15 orf 3-1 SMRDALDSDGIPLPDGERLTPFGKKLRAASLDELPELWNILKGEMSLVGPRPLLMQYLPL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 3a . pep YDNFQNRRHEMKPGITGWAQVNGRNALSWDERFACDIWYIDHFSLCLDIKILLLTVKKVL 

20 I I I I I 1 I I I I I I I I i I I I I I I I I I I I I I I I I : I I I I : I I I I I I I I I I I I I I I I I I I I I I i 

orf 3-1 YDNFQNRRHEMKPGITGWAQVNGRNALSWDEKFACDVWYIDHFSLCLDIKILLLTVKKVL 

130 140 150 160 170 180 



190 200 210 220 230 240 

25 orf 3a . pep IKEGISAQGEATMPPFTGKRKLAWGAGGHGKWAELAAALGTYGEIVFLDDRVQGSVNG 

I I I I I [ I I I I I II I I I I I I I I I I I I I I I I II I I I I : I I I I I I I I I I II I I I : I I I I I I 
orf 3-1 IKEGISAQGEATMPPFTGKRKLAWGAGGHGKWADLAAALGRYREIVFLDDRAQGSVNG 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 3a . pep FPVIGTTLLLENSLSPEQFDIAVAVGNNRIRRQIAEKAAALGFALPVLIHPDSTVSPSAT 
I I I I I I I I I I M I I II I : I : I I I I I I I I I II I I I I I I I I I I I I I I I I : I I I : I I I I M I 
orf 3-1 FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVSPSAT 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 3a . pep VGQGGWMAKA WQADSVLKDGVIVNTAATVDHDCLLDAFVH I SPGAHLS GNTRIGEESW 
I I M : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I 1 I I : I I I I I I 
orf 3-1 VGQGSWMAKAWQAGSVLKDGVIVNTAATVDHDCLLNAFVHISPGAHLSGNTHIGEESW 

310 320 330 340 350 360 

370 380 390 400 410 

orf 3a . pep IGTGACSRQQ I RIGS RAT I GAGAVWRDVSDGMTVAGNPAKPLAGKNTETLRSX 

I I I I I I I I I I I I I I I I II I I I I I I II I I I I II I I I I I I I I II I I I I I 
45 orf 3-1 I GTGAC SRQQ I RI G S RAT I GAGAVWRDVS DGMT VAGN PAKPL PRKN PET STAX 

370 380 390 400 410 

Homology with hypothetical protein encoded by yvfc gene (accession Z71928) of B. subtilis 
ORF3 and YVFC proteins show 55% aa identity in 170 aa overlap (BLASTp): 

ORF3 3 IYLIRKNLGSPVFFFQERPGKDGKPFKMVKFRSMRDGLYSDGIPLPDGERLTPFGKKLRA 62 

I ++R +GSPVFF Q RPG GKPF + KFR+M D S G LPD RLT G+ +R 
yvfc 27 IAWRLKIGSPVFFKQVRPGLHGKPFTLYKFRTMTDERDSKGNLLPDEVRLTKTGRLIRK 86 

ORF3 63 ASXDELPELWNILKGEMSLVGPRPLLMQYLPLYDNFQNRRHEMKPGITGWAQVNGRNALS 122 

S DELP+L N+LKG++ S LVGPRPLLM YLPLY Q RRHE+KPGITGWAQ+NGRNA+S 
yvfc 87 LSIDELPQLLNVLKGDLSLVGPRPLLMDYLPLYTEKQARRHEVKPGITGWAQINGRNAIS 14 6 

ORF3 123 WDEKFACDWYIDHFSLCLDXXXXXXXXXXXXXXEGISAQGEXTMPPFTG 172 
W++KF DVWY+D++S LD EGI T FTG 

60 yvfc 147 WEKKFELDVWYVDNWSFFLDLKILCLTVRKVLVSEGIQQTNHVTAERFTG 196 
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Homologv with a predicted ORF from N gonorrhoeae 

ORF3 shows 86.3% identity over a 286aa overlap with a predicted ORF (ORF3.ng) from N. 



gonorrhoeae: 

orf3 

orf3ng 

orf3 

orf3ng 

orf3 

orf3ng 

orf3 

orf 3ng 

orf3 

orf3ng 

orf3 

orf 3ng 



ILIYLI RKNLGSPVFFFQERPGKDGKPFKMVKFR 
: I I I I I I I I I I I I I I :: i I I i I I I I I I I I I I I 1 
MSKAVKRLFDIIASASGLIVLSPVFLVLIYLIRKNKGSPVFFIRERPGKDGKPFKMVKFR 



34 



60 



94 



SMRDGLYSDGIPLPDGERLTPFGKKLRAASXDELPELWNILKGEMSLVGPRPLLMQYLPL 
1111:1 I ! I I I I f I : I I I E llllllhi I I I I I I I I : I I I I I I I I I ! I I I I I I I I I I 
SMRDALDSDGIPLPDSERLTDFGKKLRATSLDELPELWNVLKGEMSLVGPRPLLMQYLPL 120 

YDNFQNRRHEMKPGITGWAQVNGRNALSWDEKFACDVWYIDHFSLCLDIKILLLTVKKVL 154 
I :: I I I I I I I I I I I I I I I I I I I I I t I I M I I I I : II I I I |:||: I I : I I I : I I I I I I I 
YNKFQNRRHEMKPGITGWAQVNGRNALSWDEKFSCDVWYTDNFSFWLDMKILFLTVKKVL 180 

IKEGISAQGEXTMPPFTGKRKLAWGAGGHGKWADLAAALGRYREIVFLDDRAQGSVNG 214 
MINIMI! lllll:hlllll:lllllllll|:IMIII I lllllllhllllll 
IKEGISAQGEATMPPFAGNRKLAVIGAGGHGKWAELAAALGTYGEIVFLDDRTQGSVNG 240 

FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVSPSAT 274 
I I M M M M M I M M : I : : M M M M M M : I : M I M I I I I h I I I I I I I I I I 
FPVIGTTLLLENSLSPEQFDITVAVGNNRIRRQITENAAALGFKLPVLIHPDATVSPSAI 300 

VGQGSWMAKAV 286 
: I M I M M M I 

IGQGSWMAKAWQAGSVLKDGVIVNTAATVDHDCLLDAFVHISPGAHLSGNTRIGEESR 360 



The complete length ORF3ng nucleotide sequence <SEQ ID 17> is: 



1 


ATGAGTAAAG 


51 


GCTGATTGTC 


101 


AAAACTTAGG 


151 


ggaaaacCTT 


201 


TTCAGACGGC 


251 


AAAAATTACG 


301 


CTCAAAGGCG 


351 


TCTGCCGCTT 


401 


GCATTACCGG 


451 


GAAAAGTTCT 


501 


GGATATGAAA 


551 


GCATTTCGGC 


601 


AAACTCGCCG 


651 


TGCCGCCGCA 


701 


CCCAAGGCAG 


751 


GAAAACAGTT 


801 


CAACCGCATC 


851 


AACTGCCCGT 


901 


ATCGGACAAG 


951 


CGTATTGAAA 


1001 


ACTGCCTGCT 


1051 


GGCAACACGC 


1101 


CCGCCAGCAG 


1151 


TTATCGTATG 


1201 


AAGCCCCTTA 



CCGTCAAACG 
CTGTCGCCCG 
TTCGCCCGTC 
TTAAAATGGT 
ATTCCGCTGC 
CGCCACCAGT 
AGATGAGCCT 
TACAACAAAT 
CTGGGCGCAG 
CCTGCGATGT 
ATCCTGTTTC 
GCAAGGGGAA 
TTATCGGCGC 
CTCGGCACAT 
CGTCAACGGC 
TATCGCCCGA 
CGCCGCCAAA 
TCTGATTCAT 
GCAGCGTCGT 
GACGGCGTGA 
TGACGCTTTC 
GTATCGGCGA 
ACAACCGTCG 
CGACATCCCG 
CGGGCAAAAA 



CCTGTTCGAC 
TGTTTTTGGT 
TTCTTCattC 
CAAATTCCGT 
CCGATAGCGA 
TTGGACGAAC 
GGTCGGCCCC 
TTCAAAACCG 
GTCAACGGGC 
TTGGTACACC 
TGACAGTCAA 
GCCACCATGC 
GGGCGGACAC 
ACGGCGAAAT 
TTCCCCGTCA 
ACAATTCGAC 
TCACCGAAAA 
CCCGACGCGA 
AATGGCGAAA 
TTGTGAACAC 
GtccaCATCA 
AGAAAGCCGG 
GCAGCGGGGT 
GACGGCATGA 
CCCCAAGACC 



ATCATCGCAT 
TTTAATATAC 
GGGAACGCCc 
TCCAtgcgcg 
ACGCCTGACC 
TTCCTGAATT 
CGCCCGCTTT 
CCGCCACGAA 
GCAACGCGCT 
GACAATTTCA 
AAAAGTCTTG 
CCCCTTTCGC 
GGCAAAGTCG 
CGTTTTTCTG 
TCGGCACGAC 
ATCACCGTCG 
CGCCGCCGCG 
CCGTCTCGCC 
GCCGTCGTAC 
TGCCGCCACC 
GCCCGGGCGC 
ATAGGCACGG 
TACCgccgGT 
CCGTCGCGGG 
GGGACGGCAT 



CCGCATCGGG 
CTCATCCGCA 
cgGAAAGGAc 
acgcgcttGA 
GATTTCGGCA 
AT GG AATGTC 
TGATGCAGTA 
ATGAAACCGG 
TTCGTGGGAC 
GCTTTTGGCT 
ATTAAAGAAG 
GGGGAATCGC 
TTGCCGAGCT 
GACGACCGCA 
GCTGCTGCTT 
CCGTCGGCAA 
CTCGGCTTCA 
TTCTGCAATA 
AGGCCGGCAG 
GTCGATCACG 
GCACCTGTCG 
GCGCGTGCAG 
GCAGGGgcGG 
CAACCCGGCA 
AA 



This encodes a protein having amino acid sequence <SEQ ID 18>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 



MSKAVKRLFD 
GKPFKMVKFR 
LKGEMSLVGP 
EKFSCDVWYT 
KLAVIGAGGH 
ENSLSPEQFD 
IGQGSWMAK 
GNTRIGEESR 
KPLTGKNPKT 



IIASASGLIV LSPVFLVLIY 



SMRDALDSDG 
RPLLMQYLPL 
DNFSFWLDMK 
GKWAELAAA 
ITVAVGNNRI 
AWQAGSVLK 
IGTGACSRQQ 
GTA* 



IPLPDSERLT 
YNKFQNRRHE 
ILFLTVKKVL 
LGTYGEIVFL 
RRQITENAAA 
DGVIVNTAAT 
TTVGSGVTAG 



LIRKNLGSPV 
DFGKKLRATS 
MKPGITGWAQ 
IKEGISAQGE 
DDRTQGSVNG 
LGFKLPVLIH 
VDHDCLLDAF 
AGAVIVCDIP 



FFIRERPGKD 
LDELPELWNV 
VNGRNALSWD 
ATMPPFAGNR 
FPVIGTTLLL 
PDATVSPSAI 
VHISPGAHLS 
DGMTVAGNPA 
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This protein shows 86.9% identity in 413 aa overlap with ORF3-1 : 

10 20 30 40 50 60 

MSKFFKRLFDIVASASGLIFLSPVFLILIYLIRKNLGSPVFFFQERPGKDGKPFKMVKFR 
III I I I I I I : I I I I I I I I I I I II : I I II I II I I I I I I I I : : I I I I I I I I I I I I II I I 
MSKAVKRLFDIIASASGLIVLSPVFLVLIYLIRKNLGSPVFFIRERPGKDGKPFKMVKFR 
10 20 30 40 50 60 

70 80 90 100 110 120 

SMRDALDSDGIPLPDGERLTPFGKKLRAASLDELPELWNILKGEMSLVGPRPLLMQYLPL 
I I I I I I I I I I I I I II : I I I I I II I I I I : I I I I I I I I I I : I I I I I I I I I I I I I I I I I II I 
SMRDALDSDGIPLPDSERLTDFGKKLRATSLDELPELWNVLKGEMSLVGPRPLLMQYLPL 
70 80 90 100 110 120 

130 140 150 160 170 180 

YDNFQNRRHEMKPGITGWAQVNGRNALSWDEKFACDVWYIDHFSLCLDIKILLLTVKKVL 
I :: I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I : I I I I I I: I I: I I : I I I : I I I I I I I 
YNKFQNRRHEMKPGITGWAQVNGRNALSWDEKFSCDVWYTDNFSFWLDMKILFLTVKKVL 
130 140 150 160 170 180 

190 200 210 220 230 240 

IKEGISAQGEATMPPFTGKRKLAWGAGGHGKWADLAAALGRYREIVFLDDRAQGSVNG 
I I I I I I I I I I I I I I I I : I : I I I I I : I I I II I I I I I : I I I I I I I I I I I I I I I : I I I I I I 
IKEGISAQGEATMPPFAGNRKLAVIGAGGHGKWAELAAALGTYGEIVFLDDRTQGSVNG 
190 200 210 220 230 240 

250 260 270 280 290 300 

FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVSPSAT 
I I I I I I I I I I I I I I I I I : I : : I I I I I I I I I I I I : I : I I I I I I I I I I : I I I I I I I I I I 
FPVIGTTLLLENSLSPEQFDITVAVGNNRIRRQITENAAALGFKLPVLIHPDATVSPSAI 
250 260 270 280 290 300 

310 320 330 340 350 360 

VGQGSWMAKAWQAGSVLKDGVIVNTAATVDHDCLLNAFVHISPGAHLSGNTHIGEESW 
: I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I : I I I I I 
IGQGSWMAKAWQAGSVLKDGVIVNTAATVDHDCLLDAFVHISPGAHLSGNTRIGEESR 
310 320 330 340 350 360 

370 380 390 400 410 

IGTGACSRQQIRIGSRATIGAGAVWRDVSDGMTVAGNPAKPLPRKNPETSTAX 
I I I I I I I I I I : I I : I 11111:1 I : I I I I I I I I I I I I I I I I : I : I I I 
IGTGACSRQQTTVGSGVTAGAGAVIVCDIPDGMTVAGNPAKPLTGKNPKTGTAX 
370 380 390 400 410 

In addition, ORF3ng shows significant homology with a hypothetical protein from B.subtilis: 

gnl |PID|e238668 (Z71928) hypothetical protein [Bacillus subtilis] 
45 >gi|1945702|gnl|PID|e313004 (Z94043) hypothetical protein [Bacillus subtilis] 

>gi|2635938|gnl|PID|ell86113 (Z99121) similar to capsular polysaccharide 
biosynthesis [Bacillus subtilis] Length = 202 

Score = 235 bits (594), Expect = 3e-61 

Identities = 114/195 (58%), Positives « 142/195 (72%) 





Query: 


5 


VKRLFDIIASASGLIVLSPVFLVLIYLIRKNLGSPVFFIRERPGKDGKPFKMVKFRSMRD 


64 






+KRLFD+ A+ L S + L I ++R +GSPVFF + RPG GKPF + KFR+M D 






Sbjct: 


3 


LKRLFDLTAAIFLLCCTSVIILFTIAWRLKIGSPVFFKQVRPGLHGKPFTLYKFRTMTD 


62 


55 


Query: 


65 


ALDSDGIPLPDSERLTDFGKKLRATSLDELPELWNVLKGEMSLVGPRPLLMQYLPLYNKF 


124 








DS G LPD RLT G+ +R S+DELP+L N VLKG+ + S LVG PRPLLM YLPLY + 






Sbjct: 


63 


ERDSKGNLLPDEVRLTKTGRLIRKLSIDELPQLLNVLKGDLSLVGPRPLLMDYLPLYTEK 


122 


60 


Query: 


125 


QNRRHEMKPGITGWAQVNGRNALSWDEKFSCDVWYTDNFSFWLDMKILFLTVKKVLIKEG 


184 






Q RRHE+KPGITGWAQ+NGRNA+SW++KF DVWY DN+SF+LD+KIL LTV+KVL+ EG 






Sbjct: 


123 


QARRHEVKPGITGWAQINGRNAISWEKKFELDVWYVDNWSFFLDLKILCLTVRKVLVSEG 


182 




Query: 


185 


I SAQGEATMPPFAGN 199 










I T F G+ 




65 


Sbjct: 


183 


IQQTNHVTAERFTGS 197 
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orf 3-1. pep 
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The hypothetical product of yvfc gene shows similarity to EXOY of R.meliloti, an 
exopolysaccharide production protein. Based on this and on the two predicted transmembrane 
regions in the homologous N.gonorrhoeae sequence, it is predicted that these proteins, or their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 4 

The following partial DNA sequence was identified in N, meningitidis <SEQ ID 19>: 

1 . .AACCATATGG CGATTGTCAT CGACGAATAC GGCGGCACAT CCGGCTTGGT 

51 CACCTTTGAA GACATCATCG AGCAAATCGT CGGCGAAATC GAAGACGAGT 

101 TTGACGAAGA CGATAGCGCC GACAATATCC ATGCCGTTTC TTCAGACACG 

151 TGGCGCATCC ATGCAGCTAC CGAAATCGAA GACATCAACA CCTTCTTCGG 

201 CACGGAATAC AGCATCGAAG AAGCCGACAC CATT.GGCGG CCTGGTCATT 

251 CAAGAGTTGG GACATCTGCC CGTGCGCGGC GAAAAAGTCC TTATCGGCGG 

301 TTTGCAGTTC ACCGTCGCAC GCGCCGACAA CCGCCGCCTG CATACGCTGA 

351 TGGCGACCCG CGTGAAGTAA GC ACCGC CGTTTCTGCA 

401 CAGTTTAG 

This corresponds to amino acid sequence <SEQ ID 20; ORF5>: 

1 . .NHMAIVIDEY GGTSGLVTFE DIIEQIVGEI EDEFDEDDSA DNIHAVSSDT 
51 WRIHAATEIE DINTFFGTEY SIEEADTIXR PGHSRVGTSA RARRKSPYRR 
101 FAVHRRTRRQ PPPAYADGDP REVS XR RFCTV* 

Further sequence analysis revealed the complete DNA sequence to be <SEQ ID 21>: 

1 ATGGACGGCG CACAACCGAA AACGAATTTT TTTGAACGCC TGATTGCCCG 

51 ACTCGCCCGC GAACCCGATT CCGCCGAAGA CGTATTAAAC CTGCTTCGGC 

101 AGGCGCACGA GCAGGAAGTT TTTGATGCGG ATACGCTTTT AAGATTGGAA 

151 AAAGTCCTCG ATTTTTCCGA TTTGGAAGTG CGCGACGCGA TGATTACGCG 

201 CAGCCGTATG AACGTTTTAA AAGAAAACGA CAGCATCGAG CGCATCACCG 

251 CCTACGTTAT CGATACCGCC CATTCGCGCT TCCCCGTCAT GGGCGAAGAC 

301 AAAGACGAAG TTTTGGGCAT TTTGCACGCC AAAGACCTGC TCAAATATAT 

351 GTTTAACCCC GAGCAGTTCC ACCTCAAATC CATTCTCCGC CCCGCCGTCT 

401 TCGTCCCCGA AGGCAAATCG CTGACCGCCC TTTTAAAAGA GTTCCGCGAA 

451 CAGCGCAACC ATATGGCGAT TGTCATCGAC GAATACGGCG GCACATCCGG 

501 CTTGGTCACC TTTGAAGACA TCATCGAGCA AATCGTCGGC GAAATCGAAG 

551 ACGAGTTTGA CGAAGACGAT AGCGCCGACA ATATCCATGC CGTTTCTTCC 

601 GAACGCTGGC GCATCCATGC AGCTACCGAA ATCGAAGACA TCAACACCTT 

651 CTTCGGCACG GAATACAGCA GCGAAGAAGC CGACACCATT CGGCCTGGTC 

701 ATTCAAGAGT TGGGACATCT GCCCGTGCGC GGCGAAAAAG TCCTTATCGG 

751 CGGTTTGCAG TTCACCGTCG CACGCGCCGA CAACCGCCGC CTGCATACGC 

801 TGATGGCGAC CCGCGTGAAG TAAGCACCGC CGTTTCTGCA CAGTTTAGGA 

851 TGACGGTACG GGCGTTTTCT GTTTCAATCC GCCCCATCCG CCAAACATAA 

This corresponds to amino acid sequence <SEQ ID 22; ORF5-l>: 



1 MDGAQPKTNF FERLIARLAR EPDSAEDVLN LLRQAHEQEV FDADTLLRLE 

51 KVLDFSDLEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 

101 KDEVLGILHA KDLLKYMFNP EQFHLKSILR PAVFVPEGKS LTALLKEFRE 

151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG EIEDEFDEDD SADNIHAVSS 

201 ERWRIHAATE IEDINTFFGT EYSSEEADTI RPGHSRVGTS ARARRKSPYR 

251 RFAVHRRTRR QPPPAYADGD PREVSTAVSA QFRMTVRAFS VSIRPIRQT* 

Further work identified the corresponding gene in strain A of meningitidis <SEQ ID 23 >: 



1 ATGGACGGCG CACAACCGAA AACAAATTTT TTNNAACGCC TGATTGCCCG 

51 ACTCGCCCGC GAACCCGATT CCGCCGAAGA CGTATTGACC CTGTTGCGCC 

101 AAGCGCACGA ACAGGAAGTA TTTGATGCGG ATACGCTTTT AAGATTGGAA 

151 AAAGTCCTCG ATTTTTCTGA TTTGGAAGTG CGCGACGCGA TGATTACGCG 

201 CAGCCGTATG AACGTTTTAA AAGAAAACGA CAGCATCGAA CGCATCACCG 

251 CCTACGTTAT CGATACCGCC CATTCGCGCT TCCCCGTCAT CGGTGAAGAC 

301 AAAGACGAAG TTTTGGGTAT TTTGCACGCC AAAGACCTGC TCAAATATAT 

351 GTTCAACCCC GAGCAGTTCC ACCTCAAATC GATATTGCGC CCTGCCGTCT 
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401 TCGTCCCCGA AGGCAAATCG CTGACCGCCC TTTTAAAAGA GTTCCGCGAA 

451 CAGCGCAACC ATATGGCAAT CGTCATCGAC GAATACGGCG GCACGTCGGG 

501 TTTGGTAACT TTTGAAGACA TCATCGAGCA AATCGTCGGC GACATCGAAG 

551 ATGAGTTTGA CGAAGACGAA AGCGCGGACA ACATCCACGC CGTTTCCGCC 

601 GAACGCTGGC GCATCCACGC GGCTACCGAA ATCGAAGACA TCAACGCCTT 

651 TTTCGGCACG GAATACAGCA GCGAAGAAGC CGACACCATC GGCGGCCNTG 

701 GTCATTCAGG AATTGGNACA CCTGCCCGTG CGCGGCGAAA AAGTCNTTAT 

751 CGGCGNNTTG CANTTCACNG TCGCCNGCGC NGACAACCGC CGCCTGCATA 

801 CGCTGATGGC GACCCGCGTG AAGTAAGCTC CGCCGTTTCT GTACAGTTTA 

851 GGATGACGGT ACGGGCGTTT TCTGTTTCAA TCCGCCCCAT CCGCCANACA 

901 TAA 

This encodes a protein having amino acid sequence <SEQ ID 24; ORF5a>: 



1 MDGAQPKTNF XXRLIARLAR EPDSAEDVLT LLRQAHEQEV FDADTLLRLE 
51 KVLDFSDLEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 
101 KDEVLGILHA KDLLKYMFNP EQFHLKSILR PAVFVPEGKS LTALLKEFRE 
151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG DIEDEFDEDE SADNIHAVSA 
201 ERWRIHAATE IEDINAFFGT EYSSEEADTI GGXGHSGIGT PARARRKSXY 
251 RRXAXHXRXR XQPPPAYADG DPREVSSAVS VQFRMTVRAF SVSIRPIRXT 
301 * 

The originally-identified partial strain B sequence (ORF5) shows 54.7% identity over a 124aa 
overlap with ORF5a: 



10 20 30 

or f 5. pep NHMAIVIDEYGGTSGLVTFEDIIEQIVGEI 

I I I I I I I I I I I I I I I I I I I I II I I I I I I : I 
orf5a FHLKSILRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVGDI 
130 140 150 160 170 180 



40 50 60 70 80 90 

orf 5 . pep EDEFDEDDSADNIHAVSSDTWRIHAATEIEDINTFFGTEYSIEEADTIXRPGHSRVGTSA 
I I I I I I t : I I I I I I I I I : : I I I I I I I I I I I I I : I I I I I I I MINI III :|| I 
orf 5a EDEFDEDESADNIHAVSAERWRIHAATEIEDINAFFGTEYSSEEADTIGGXGHSGIGTPA 
190 200 210 220 230 240 

100 110 120 130 

orf 5 . pep RARRKSPYRRFAVHRRTRRQPPPAYADGDPREVSXXXXXRRFCTV 

I I I I I I III I I CI I I II I I I I II I I I I I 
orf 5a RARRKSXYRRXAXHXRXRXQPPPAYADGDPREVSSAVSVQFRMTVRAFSVSIRPIRXTX 
250 260 270 280 290 300 

The complete strain B sequence (ORF5-1) and ORF5a show 92.7% identity in 300 aa overlap: 



10 20 30 40 50 60 

orf 5a . pep MDGAQPKTNFXXRLIARLAREPDSAEDVLTLLRQAHEQEVFDADTLLRLEKVLDFSDLEV 

I I I I I I I I I I I I I I I I I I I I I I I I II I : I I I I I I I I I I I I I 1 I I I II I I I I I II 1 ! I I 
orf 5-1 MDGAQPKTNFFERLIARLAREPDSAEDVLNLLRQAHEQEVFDADTLLRLEKVLDFSDLEV 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 5a . pep RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYMFNP 
I I I M I I 1 I I I I I I I 1 i I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 5-1 RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYMFNP 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 5a. pep EQFHLKSILRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVT FEDIIEQIVG 
Ml I MM MM il I I I IIMIjlllMI I! Ml II Ml II! IIIIMI III MM Ml I 
orf 5-1 EQFHLKS I LRPAVFVPEGKSLTALLKEFREQRNHMAI VI DEYGGTSGLVT FEDIIEQIVG 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 5a. pep DIEDEFDEDESADNIHAVSAERWRIHAATEIEDINAFFGTEYSSEEADTIGGXGHSGIGT 
: I } I I I I I I : I I I I I I I I I : I I I I I I I I II I I II I : I I I I I I II I I I II I III : I I 
orf 5-1 EIEDEFDEDDSADNIHAVSSERWRIHAATEIEDINTFFGTEYSSEEADTIRP-GHSRVGT 

190 200 210 220 230 



250 260 270 280 290 300 
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orf 5a . pep PARARRKSXYRRXAXHXRXRXQPPPAYADGDPREVSSAVSVQFRMTVRAFSVSIRPIRXT 
I I I I I I I III I I 1:1 I i 1 I I I I I II II I I I : I I I : I I II I I I I I I I I I I II I I 
orf 5-1 SARARRKSPYRRFAVHRRTRRQPPPAYADGDPREVSTAVSAQFRMTVRAFSVSIRPIRQT 
240 250 260 270 280 290 

Further work identified the a partial DNA sequence in Kgonorrhoeae <SEQ ID 25> which encodes 
a protein having amino acid sequence <SEQ ID 26; ORF5ng>: 

1 MDGAQPKTNF FERLIARLAR EPDSAEDVLN LLRQAHEQEV FDADTLTRLE 

51 KVLDFAELEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 

101 KDEVLGILHA KDLLKYMFNP EQFHLKSVLR PAVFVPEGKS LTALLKEFRE 

151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG DIEDEFDEDE SADDIHSVSA 

201 ERWRIHAATE IEDINAFFGT EYGSEEADTI RRLGHSGIGT PARARRKSPY 

251 RRFAVHRRPR RQPPPAHADG DPREVSRACP HRRFCTV* 

Further analysis revealed the complete gonococcal nucleotide sequence <SEQ ID 27> to be: 

1 ATGGACGGCG CACAACCGAA AACAAATTTT TTTGAACGCC TGATTGCCCG 

51 ACTCGCCCGC GAACCCGATT CCGCCGAAGA CGTATTAAAC CTGCTTCGGC 

101 AGGCGCACGA ACAGGAAGTT TTTGATGCCG ACACACTGAC CCGGCTGGAA 

151 AAAGTATTGG ACTTTGCCGA GCTGGAAGTG CGCGATGCGA TGATTACGCG 

201 CAGCCGCATG AACGTATTGA AAGAAAACGA CAGCATCGAA CGCATCACCG 

251 CCTACGTCAT CGATACCGCC CATTCGCGCT TCCCCGTCAT CGGCGAAGAC 

301 AAAGACGAAG TTTTGGGCAT TTTGCACGCC AAAGACCTGC TCAAATATAT 

351 GTTCAACCCC GAGCAGTTCC ACCTGAAATC CGTCTTGCGC CCTGCCGTTT 

4 01 TCGTGCCCGA AGGCAAATCT TTGACCGCCC TTTTAAAAGA GTTCCGCGAA 

451 CAGCGCAACC ATATGGCAAT CGTCATCGAC GAATACGGCG GCACGTCGGG 

501 TTTGGTCACC TTTGAAGACA TCATCGAGCA AATCGTCGGT GACATCGAAG 

551 ACGAGTTTGA CGAAGACGAA AGCGccgacg acatCCACTC cgTTTccgCC 

601 GAACGCTGGC GCATCCacgc ggctaCCGAA ATCGAAGaca TCAACGCCTT 

651 TTTCGGTACG GAatacggca gcgaagaagc cgacaccatc cggcggctTG 

7 01 GTCATTCAGG AATTGGGACA CCTGCCCGTG CGCGGCGAAA AAGTCCTTAt 

751 cggcgGTTTG Cagttcaccg tCGCCCGCGC CGACAACCGC CGCCTGCACA 

801 CGCTGATGGC GACCCGCGTG AAGTAAGCAG AGCCTGCCcg AccgccgttT 

851 CTGCacAGTT TAGGatgACG gtaCGGTCGT TTTCTGTTTC AATCCGCCCC 

901 ATCCGCCAAA CATAA 

This encodes a protein having amino acid sequence <SEQ ID 28; ORF5ng-l>: 

1 MDGAQPKTNF FERLIARLAR EPDSAEDVLN LLRQAHEQEV FDADTLTRLE 
51 KVLDFAELEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 
101 KDEVLGILHA KDLLKYMFNP EQFHLKSVLR PAVFVPEGKS LTALLKEFRE 
151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG DIEDEFDEDE SADDIHSVSA 
201 ERWRIHAATE IEDINAFFGT EYGSEEADTI RRLGHSGIGT PARARRKSPY 
251 RRFAVHRRPR RQPPPAHADG DPREVSRACP TAVSAQFRMT VRSFSVSIRP 
301 IRQT* 

The originally-identified partial strain B sequence (ORF5) shows 83.1% identity over a 135aa 
overlap with the partial gonococcal sequence (ORFSng): 

orf 5 NHMAIVIDEYGGTSGLVTFEDIIEQIVGEI 30 

I Ml I I I M I I I I I I I I II I I I I I II M:l 
orf5ng FHLKSVLRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVGDI 182 

orf 5 EDEFDEDDSADNIHAVSSDTWRIHAATEIEDINTFFGTEYSIEEADTIXRPGHSRVGTSA 90 

I Mill l:!ll:ll:| I:: I I I I I I I I I I I I I : I II I I I : I I I I I I I I I I Ml I 
orf5ng EDEFDEDESADDIHSVSAERWRIHAATEIEDINAFFGTEYGSEEADTIRRLGHSGIGTPA 242 

or f 5 RARRKSPYRRFAVHRRTRRQPPPAYADGDPREVSX RRFCTV 131 

I I M I I I II I I I I II I I I 1 t I t f : I I I I I 1 I 1 I Ml Ml 

orfSng RARRKSPYRRFAVHRRPRRQPPPAHADGDPREVSRACPHRRFCTV 287 

The complete strain B and gonococcal sequences (ORF5-1 & ORF5ng-l) show 92.4% identity in 
304 aa overlap: 

10 20 30 40 50 60 

or f 5ng- 1 . pep MDGAQPKTNFFERLIARLAREPDSAEDVLNLLRQAHEQEVFDADTLTRLEKVLDFAELEV 
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!( I I I I II I I I I II II I I II I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I h: I I I 
orf5-l MDGAQPKTNFFERLIARLAREPDSAEDVLNLLRQAHEQEVFDADTLLRLEKVLDFSDLEV 

10 20 30 40 50 60 



10 



70 80 90 100 110 120 

or f 5ng-l . pep RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYMFNP 
II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 1 I I I I II I I I I I I I I I I I I I I I I 
orf 5-1 RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYMFNP 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 5ng-l . pep EQFHLKSVLRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVG 
I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I 
orf 5-1 EQFHLKSILRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVG 
15 130 140 150 160 170 180 

190 200 210 220 230 240 

orf 5ng-l . pep DIEDEFDEDESADDIHSVSAERWRIHAATEIEDINAFFGTEYGSEEADTIRRLGHSGIGT 
: I I I I I I I I ; I I I : I I z I I : I I I I I I I I I I I I I I I : I I I I I I : I I i I I I I I Ml : I I 
20 orf 5-1 EIEDEFDEDDSADNIHAVSSERWRIHAATEIEDINTFFGTEYSSEEADTIRP-GHSRVGT 

190 200 210 220 230 



250 260 270 280 290 300 

PARARRKSPYRRFAVHRRPRRQPPPAHADGDPREVSRACPTAVSAQFRMTVRSFSVSIRP 
I I I I I I I I I I I I I I I I I I I I I I I I : I I M I I I I I I I I I I I I I I I II : I I I I I I I 

SARARRKSPYRRFAVHRRTRRQPPPAYADGDPREVS TAVSAQFRMTVRAFSVSIRP 

240 250 260 270 280 290 



30 or f 5ng-l . pep IRQTX 

I I I I I 

orf5-l IRQTX 
300 

Computer analysis of these amino acid sequences indicates a putative leader sequence, and 
35 identified the following homologies: 

Homology with hemolysin homolog TlvC (accession U32716) of H.influenzae 
ORF5 and TlyC proteins show 58% aa identity in 77 aa overlap (BLASTp). 



orfSng-l.pep 

25 

orf5-l 



ORF5 


2 


HMAIVIDEYGGTSGLVTFEDIIEQIVGEIEDEFDEDDSADNIHAVSSDTWRIHAATEIED 


61 






HMAIV+DE+G SGLVT EDI+EQIVG+IEDEFDE++ AD I +S T+ + A T+I+D 




TlyC 


166 


HMAIWDEFGAVSGLVTIEDILEQIVGDIEDEFDEEEIAD-IRQLSRHTYAVRALTDIDD 


224 


ORF5 


62 


INTFFGTEYSIEEADTI 78 








N F T++ EE DTI 




TlyC 


225 


FNAQFNTDFDDEEVDTI 241 





45 ORF5ng-l also shows significant homology with TlyC: 

SCORES Initl: 301 Initn: 419 Opt: 668 

Smith-Waterman score: 668; 45.9% identity in 242 aa overlap 

10 20 30 40 50 

50 orf 5ng-l . pep MDGAQPKTNFFERLIARLAR-EPDSAEDVLNLLRQAHEQEVFDADTLTRLEK 

| | | : | : : | : : | : | :::::: I :::::::: I : I : | 
tlycjiaein MNDEQQNSNQSENTKKPFFQSLFGRFFQGELKNREELVEVIRDSEQNDLIDQNTREMIEG 

10 20 30 40 50 60 



55 60 70 80 90 100 109 

orf 5ng-l . pep VLDFAELEVRDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGE — DKDEVLGILH 
I : : : t I I : I I I II II:: :::::::: : I : : I I I I I I I I : : I : I : : : I 1 I I 

tlyc haein VMEIAELRVRDIMIPRSQIIFIEDQQDLNTCLNTIIESAHSRFPVIADADDRDNIVGILH 

70 80 90 100 110 120 



60 



110 120 130 140 150 160 

orf 5ng-l . pep AKDLLKYMF-NPEQFHLKSVLRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGL 
llllll:: : I I I : I : II I : I : I I I : I : : I I : I I : I I I I I I : I I : I : : I I I 
tlycjiaein AKDLLKFLREDAEVFDLSSLLRPWIVPESKRVDRMLKDFRSERFHMAIWDEFGAVSGL 
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130 140 150 160 170 180 

170 180 190 200 210 220 

orf 5ng-l . pep VTFEDIIEQIVGDIEDEFDEDESADDIHSVSAERWRIHAATEIEDINAFFGTEYGSEEAD 
5 " I I : I I I : I I I I I I I I I I I I I : I I I I : : : I : : : : I I : I : I : I I I : I : : : I I : I 

tlyc haein VTIEDILEQIVGDIEDEFDEEEIAD-IRQLSRHTYAVRALTDIDDFNAQFNTDFDDEEVD 

190 200 210 220 230 

230 240 250 260 270 280 

10 orf5ng-l .pep TIRRLGHSGIG-TPARARRKSPYRRFAVHRRPRRQPPPAHADGDPREVSRACPTAVSAQF 

II I : :| I I: 

tlyc haein TIGGLIMQTFGYLPKRGEEIILKNLQFKVTSADSRRLIQLRVTVPDEHLAEMNNVDEKSE 
240 250 260 270 280 290 

15 Homology with a hypothetical secreted protein from E.coli: 

ORF5a shows homology to a hypothetical secreted protein from E.coli: 

sp|P77392|YBEX_ECOLI HYPOTHETICAL 33.3 KD PROTEIN IN CUTE-ASNB INTERGENIC REGION 
>gi 1 1778577 (U82598) similar to H. influenzae [Escherichia coli] >gi 1 1786879 
(AE000170) f292; This 292 aa ORF is 23% identical (9 gaps) to 272 residues of an 
20 approx. 440 aa protein YTFL_HAEIN SW: P44717 [Escherichia coli] Length = 292 

Score = 212 bits (533), Expect = 3e-54 

Identities = 112/230 (48%), Positives = 149/230 (64%), Gaps = 3/230 (1%) 

25 Query: 2 DGAQPKTNFXXRLIARLAR-EPDSAEDVLTLLRQAHEQEVFDADTLLRLEKVLDFSDLEV 60 

D K F L+++L EP + +++L L+R + + ++ D DT LE V+D +D V 
Sbjct: 10 DTISNKKGFFSLLLSQLFHGEPKNRDELLALIRDSGQNDLIDEDTRDMLEGVMDIADQRV 69 

Query: 61 RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYM-FN 119 
30 "* RD MI RS+M LK N +++ +I++AHSRFPVI EDKD + GIL AKDLL +M + 

Sbjct: 70 RDIMIPRSQMITLKRNQTLDECLDVIIESAHSRFPVISEDKDHIEGILMAKDLLPFMRSD 129 

Query: 120 PEQFHLKSILRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIV 179 
E F + +LR AV VPE K + +LKEFR QR HMAIVIDE+GG SGLVT EDI+E IV 
35 Sbjct: 130 AEAFSMDKVLRQAWVPESKRVDRMLKEFRSQRYHMAIVIDEFGGVSGLVTIEDILELIV 189 

Query: 180 GDIEDEFDEDESADNIHAVSAERWRIHAATEIEDINAFFGTEYSSEEADT 229 

G+IEDE+DE++ D +S W + A IED N FGT +S EE DT 
Sbjct: 190 GEIEDEYDEEDDID-FRQLSRHTWTVRALASIEDFNEAFGTHFSDEEVDT 238 

40 Based on this analysis, including the amino acid homology to the TlyC hemolysin-homologue from 
H. influenzae (hemolysins are secreted proteins), it was predicted that the proteins from 
K meningitidis and N. gonorrhoeae are secreted and could thus be useful antigens for vaccines or 
diagnostics. 

ORF5-1 (30.7kDa) was cloned in the pGex vector and expressed in E.coli, as described above. The 
45 products of protein expression and purification were analyzed by SDS-PAGE. Figure 2 A shows 
the results of affinity purification of the GST-fusion protein. Purified GST-fusion protein was used 
to immunise mice, whose sera were used for Western blot analysis (Figure IB). These experiments 
confirm that ORF5-1 is a surface-exposed protein, and that it is a useful immunogen. 

Example 5 

50 The following partial DNA sequence was identified in ^meningitidis <SEQ ID 29>: 



1 ATGCGCGGCG GCAGGCCGGA TTCCGTTACC GTGCAGATTA TCGAAGGTTC 
51 GCGTTTTTCG CATATGAGGA AAGTCATCGA CGCAACGCCC GACATCGGAC 
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101 ACGACACCAA AGGCTGGAGC AATGAAAAAC TGATGGCGGA AGTTGCGCCC 

151 GATGCCTTCA GCGGCAATCC TGAAgGGCAG TTTTTCCCCG ACAGCTACGA 

201 AATCGATGCG GGCGGCAGTG ATTTGCAGAT TTACCAAACC GCCTACAAgG 

251 GCGATGCAAC GCCGCCTGAA TGAgGGCATG GGAAAGCAGG CAGGACGGGC 

301 TGCCTTATAA AAACCCTTAT GAAATGCTGA TTATGGCGAr CCTGGTCGAA 

351 AAGGAAACAG GGCATGAAGC CGAsCsCGAC CATGTcGCTT CCGTCTTCGT 

401 CAACCGCCTG AAAATCGGTA TGCGCCTGCA AACCgAssCG TCCGTGATTT 

451 ACGGCATGGG TGCGGCATAC AAGGGCAAAA TCCGTAAAGC CGACCTGCGC 

501 CGCGACACGC CGTACAACAC CTACACGCGC GGCGGTCTGC CGCCAACCCC 

551 GATTGCGCTG CCC. . 

This corresponds to the amino acid sequence <SEQ ID 30; ORF7>: 



1 MRGGRPDSVT VQIIEGSRFS HMRKVIDATP DIGHDTKGWS NEKLMAEVAP 

51 DAFSGNPEGQ FFPDSYEIDA GGSDLQIYQT AYKAMQRRLN EAWESRQDGL 

101 PYKNPYEMLI MAXLVEKETG HEAXXDHVAS VFVNRLKIGM RLQTXXSVIY 

151 GMGAAYKGKI RKADLRRDTP YNTYTRGGLP PTPIALP.. 

Further sequence analysis revealed the complete DNA sequence <SEQ ID 31>: 



1 ATGTTGAGAA AATTGTTGAA ATGGTCTGCC GTTTTTTTGA CCGTGTCGGC 

51 AGCCGTTTTC GCCGCGCTGC TTTTTGTTCC TAAGGATAAC GGCAGGGCAT 

101 ACCGAATCAA AATTGCCAAA AACCAGGGTA TTTCGTCGGT CGGCAGGAAA , 

151 CTTGCCGAAG ACCGCATCGT GTTCAGCAGG CATGTTTTGA CGGCGGCGGC 

201 CTACGTTTTG GGTGTGCACA ACAGGCTGCA TACGGGGACG TACAGATTGC 

251 CTTCGGAAGT GTCTGCTTGG GATATCTTGC AGAAAATGCG CGGCGGCAGG 

301 CCGGATTCCG TTACCGTGCA GATTATCGAA GGTTCGCGTT TTTCGCATAT 

351 GAGGAAAGTC ATCGACGCAA CGCCCGACAT CGGACACGAC ACCAAAGGCT 

401 GGAGCAATGA AAAACTGATG GCGGAAGTTG CGCCCGATGC CTTCAGCGGC 

4 51 AATCCTGAAG GGCAGTTTTT CCCCGACAGC TACGAAATCG ATGCGGGCGG 

501 CAGTGATTTG CAGATTTACC AAACCGCCTA CAAGGCGATG CAACGCCGCC 

551 TGAATGAGGC ATGGGAAAGC AGGCAGGACG GGCTGCCTTA TAAAAACCCT 

601 TATGAAATGC TGATTATGGC GAGCCTGGTC GAAAAGGAAA CAGGGCATGA 

651 AGCCGACCGC GACCATGTCG CTTCCGTCTT CGTCAACCGC CTGAAAATCG 

701 GTATGCGCCT GCAAACCGAC CCGTCCGTGA TTTACGGCAT GGGTGCGGCA 

751 TACAAGGGCA AAATCCGTAA AGCCGACCTG CGCCGCGACA CGCCGTACAA 

801 CACCTACACG CGCGGCGGTC TGCCGCCAAC CCCGATTGCG CTGCCCGGCA 

851 AGGCGGCACT CGATGCCGCC GCCCATCCGT CCGGCGAAAA ATACCTGTAT 

901 TTCGTGTCCA AAATGGACGG CACGGGCTTG AGCCAGTTCA GCCATGATTT 

951 GACCGAACAC AATGCCGCCG TCCGCAAATA TATTTTGAAA § AAATAA 

This corresponds to the amino acid sequence <SEQ ID 32; ORF7-l>: 

1 MLRKLLKWSA VFLTVSAAVF A ALLFVPKDN GRAYRIKIAK NQGISSVGRK 

51 LAEDRIVFSR HVLTAAAYVL GVHNRLHTGT YRLPSEVSAW DILQKMRGGR 

101 PDSVTVQIIE GSRFSHMRKV IDATPDIGHD TKGWSNEKLM AEVAPDAFSG 

151 NPEGQFFPDS YEIDAGGSDL QIYQTAYKAM QRRLNEAWES RQDGLPYKNP 

201 YEMLIMASLV EKETGHEADR DHVASVFVNR LKIGMRLQTD PSVIYGMGAA 

251 YKGKIRKADL RRDTPYNTYT RGGLPPTPIA LPGKAALDAA AHPSGEKYLY 

301 FVSKMDGTGL SQFSHDLTEH NAAVRKYILK K* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical protein encoded by vcez gene (accession P44270) of H.influenzae 
ORF7 and yceg proteins show 44% aa identity in 192 aa overlap: 

ORF7 1 MRGGRPDSVTVQIIEGSRFSHMRKVIDATPDIGHDTKGWSNEKLMA EVAPDAFSG 55 

+ G+ V+ IEG F RK ++ P + K SNE++ A ++ + 

yceg 102 LNSGKEVQFNVKWIEGKT FKDWRKDLENAPHLVQTLKDKSNEE I FALLDLPDIGQNLELK 161 

ORF7 56 NPEGQFFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWESRQDGLPYKNPYEMLIMAXLV 115 

N EG +PD+Y +DL++ + + + M++ LN+AW R + LP NPYEMLI+A +V 

yceg 162 NVEGWLYPDTYNYTPKSTDLELLKRSAERMKKALNKAWNERDEDLPLANPYEMLILASIV 221 

ORF7 116 EKETGHEAXX DHVASVFVNRLKI GMRLQTXXS V I YGMGAAYKGK I RKADLRRDTP YNTYT 175 

EKETG VASVF+NRLK M+LQT +VIYGMG Y G IRK DL TPYNTY 

yceg 222 EKETGIANERAKVASVFINRLKAKMKLQTDPTVIYGMGENYNGNIRKKDLETKTPYNTYV 281 
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0RF7 17 6 RGGLPPTPIALP 187 

GLPPTPIA+P 
yceg 282 IDGLPPTPIAMP 293 

The complete length YCEG protein has sequence: 

1 MKKFLIAILL LI L I LAG V AS FS YYKMTEFV KTPVNVQADE LLTIERGTTS 

51 SKLATLFEQE KLIADGKLLP YLLKLKPELN KIKAGTYSLE NVKTVQDLLD 

101 LLNSGKEVQF NVKWIEGKTF KDWRKDLENA PHLVQTLKDK SNEEIFALLD 

151 LPDIGQNLEL KNVEGWLYPD TYNYTPKSTD LELLKRSAER MKKALNKAWN 

201 ERDEDLPLAN PYEMLILASI VEKETGIANE RAKVASVFIN RLKAKMKLQT 

251 DPTVIYGMGE NYNGNIRKKD LETKTPYNTY VIDGLPPTPI AMPSESSLQA 

301 VANPEKTDFY YFVADGSGGH KFTRNLNEHN KAVQEYLRWY RSQKNAK 

Homology with a predicted ORF from N.meninzitidis (strain A) 

ORF7 shows 95.2% identity over a 187aa overlap with an ORF (ORF7a) from strain A of K 
meningitidis: 

10 20 30 

orf 7 .pep MRGGRPDSVTVQIIEGSRFSHMRKVIDATP 

! I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 7a AAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKVIDATP 
70 80 90 100 110 120 

40 50 60 70 80 90 

orf 7. pep DIGHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLQIYQTAYKAMQRRLN 
II M I M I I I I I M I t I I I M II I I 1 I I II M I I II I I I M I I I : I I I MINIMI! 
orf 7a DIEHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLRIYQIAYKAMQRRLN 
130 140 150 160 170 180 

100 110 120 130 140 150 

orf 7 . pep EAWESRQDGLPYKNPYEMLIMAXLVEKETGHEAXXDHVASVFVNRLKIGMRLQTXXSVIY 
I M II II II I M M M M M M I : II II I I II I II I I I I II II I I I I II I I I I II 
orf 7a EAWESRQDGLPYKNPYEMLIMASLIEKETGHEADRDHVASVFVNRLKIGMRLQTDPSVIY 
190 200 210 220 230 240 

160 170 180 

orf 7 . pep GMGAAYKGK I RKADLRR DT P YN T YTRGGL P PT P I AL P 

M I I II i i I I II I II I II i II I I I I I I II II I II II I 
orf 7a GMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLYFVSKM 
250 260 270 280 290 300 

orf 7a DGTGLSQFSHDLTEHNAAVRKYILKKX 
310 320 330 

The complete length ORF7a nucleotide sequence <SEQ ID 33> is: 

1 ATGTTGAGAA AATTGTTGAA ATGGTCTGCC GTTTTTTTGA CCGTATCGGC 

51 AGCCGTTTTC GCCGCGCTGC TTTTCGTCCC TAAAGACAAC GGCAGGGCAT 

101 ACAGGATTAA AATTGCCAAA AACCAGGGTA TTTCGTCGGT CGGCAGGAAA 

151 CTTGCCGAAG ACCGCATCGT GTTCAGCAGG CATGTTTTGA CGGCGGCGGC 

201 CTACGTTTTG GGTGTGCACA ACAGGCTGCA TACGGGGACG TACAGACTGC 

251 CTTCGGAAGT GTCTGCTTGG GATATCTTGC AGAAAATGCG CGGCGGCAGG 

301 CCGGATTCCG TTACCGTGCA GATTATCGAA GGTTCGCGTT TTTCGCATAT 

351 GAGGAAAGTC ATCGACGCAA CGCCCGACAT CGAACACGAC ACCAAAGGCT 

401 GGAGCAATGA AAAACTGATG GCGGAAGTTG CCCCTGATGC CTTCAGCGGC 

451 AATCCTGAAG GGCAGTTTTT CCCCGACAGC TACGAAATCG ATGCGGGCGG 

501 CAGCGATTTA CGGATTTACC AAATCGCCTA CAAGGCGATG CAACGCCGAC 

551 TGAATGAGGC ATGGGAAAGC AGGCAGGACG GGCTGCCTTA TAAAAACCCT 

601 TATGAAATGC TGATTATGGC GAGCCTGATC GAAAAGGAAA CAGGGCATGA 

651 AGCCGACCGC GACCATGTCG CTTCCGTCTT CGTCAACCGC CTGAAAATCG 

701 GTATGCGCCT GCAAACCGAC CCGTCCGTGA TTTACGGCAT GGGTGCGGCA 

751 TACAAGGGCA AAATCCGTAA AGCCGACCTG CGCCGCGACA CGCCGTACAA 

801 CACCTACACG CGCGGCGGTC TGCCGCCAAC CCCGATCGCG CTGCCCGGCA 

851 AGGCGGCACT CGATGCCGCC GCCCATCCGT CCGGTGAAAA ATACCTGTAT 

901 TTCGTGTCCA AAATGGACGG TACGGGCTTG AGCCAGTTCA GCCATGATTT 

951 GACCGAACAC AACGCCGCCG TTCGCAAATA TATTTTGAAA AAATAA 
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This is predicted to encode a protein having amino acid sequence <SEQ ID 34>: 

1 MLRKLLKWSA VFLTVSAAVF A ALLFVPKDN GRAYRIKIAK NQGISSVGRK 

51 LAEDRIVFSR HVLTAAAYVL GVHNRLHTGT YRLPSEVSAW DILQKMRGGR 

101 PDSVTVQIIE GSRFSHMRKV IDATPDIEHD TKGWSNEKLM AEVAPDAFSG 

5 151 NPEGQFFPDS YEIDAGGSDL RIYQIAYKAM QRRLNEAWES RQDGLPYKNP 

201 YEMLIMASLI EKETGHEADR DHVASVFVNR LKIGMRLQTD PSVIYGMGAA 

251 YKGKIRKADL RRDTPYNTYT RGGLPPTPIA LPGKAALDAA AHPSGEKYLY 

301 FVSKMDGTGL SQFSHDLTEH NAAVRKYILK K* 

A leader peptide is underlined. 
10 ORF7a and ORF7-1 show 98.8% identity in 33 1 aa overlap: 

10 20 30 40 50 60 

orf 7a . pep MLRKLLKWSAVFLTVSAAVFAALLFVPKDNGRAYRIKIAKNQGISSVGRKLAEDRIVFSR 
I I I 1 I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 7-1 MLRKLLKWSAVFLTVSAAVFAALLFVPKDNGRAYRIKIAKNQGISSVGRKLAEDRIVFSR 
15 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 7a . pep H VLT AAAYVLGVHNRLHTGT YRL P SE VS AW D I LQKMRGGR P D S VT VQI I EG S R FS HMRKV 
I I I I I I I II I I I I I I I I t I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 
20 orf 7-1 HVLTAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKV 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 7a . pep IDATPDIEHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLRIYQIAYKAM 

25 ^ I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I 

orf 7-1 IDATPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLQIYQTAYKAM 
130 140 150 160 170 180 

190 200 210 220 230 240 

30 orf 7 a. pep QRRLNEAWESRQDGLPYKNPYEMLIMASLIEKETGHEADRDHVASVFVNRLKIGMRLQTD 

I II [ I I I I II I I 1 I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
orf 7-1 QRRLNEAWESRQDGLPYKNPYEMLIMASLVEKETGHEADRDHVASVFVNRLKIGMRLQTD 
190 200 210 220 230 240 

35 250 260 270 280 290 300 

orf 7a . pep P S V I YGMGAAYKGKI RKADLRR DT P YNT YT RGGL P PT P I AL PGKAALDAAAH P SGEKYL Y 
M I I I I I I M M I I I IE I M M I I I I E I i I ! I I II M II I I I I I I II II I I I I 1 1 I I I I I 
orf 7-1 PSVIYGMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLY 
250 260 270 280 290 300 

40 

310 320 330 

orf 7a . pep FVSKMDGTGLSQFSHDLTEHNAAVRKYILKKX 
I I I I I I I II I I I I I I I I I I I I I I I I I I I II II 
o r f 7 - 1 FVS KMDGTGLSQFSHDLTEHNAAVRKY I LKKX 

45 310 320 330 

Homology with a predicted ORF from Ksonorrhoeae 

ORF7 shows 94.7% identity over a 187aa overlap with a predicted ORF (ORF7.ng) from N. 
gonorrhoeae: 

50 orf 7 MRGGRPDSVTVQIIEGSRFSHMRKVIDATPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQ 60 

I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf7ng MRGGRPDSVTVQIIEGSRFSHMRKVIDATPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQ 60 

orf 7 FFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWESRQDGLPYKNPYEMLIMAXLVEKETG 120 

55 I | | || | M I II I I I I I I I I I I I II I I I I I I I I I : I I I I II I I I I I I I I I I I 1:11111 

orf7ng FFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWAGRQDGLPYKNPYEMLIMASLIEKETG 120 

orf 7 HEAXXDHVASVFVNRLKI GMRLQTXXSV I YGMGAAYKGKI RKADLRRDT PYNT YTRGGLP 180 

I I I I 1 I I II I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 
60 orf7ng HEADRDHVASVFVNRLKIGMRLQTDPSVIYGMGAAYKGKIRKADLRRDTPYNTYTGGGLP 180 



orf7 



PTPIALP 



187 
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M 1 1 1 1 

orf7ng PTRIALPGKAAMDAAAHPSGEKYLYFVSKMDGTGLSQFSHDLTEHNAAVRKYILKK 236 

An ORF7ng nucleotide sequence <SEQ ID 35> is predicted to encode a protein having amino acid 
sequence <SEQ ID 36>: 



1 MRGGRPDSVT VQIIEGSRFS HMRKVIDATP 

51 DAFSGNPEGQ FFPDSYEIDA GGSDLQIYQT 

101 PYKNPYEMLI MASLIEKETG HEADRDHVAS 

151 GMGAAYKGKI RKADLRRDTP YNTYTGGGLP 

201 EKYLYFVSKM DGTGLSQFSH DLTEHNAAVR 



DIGHDTKGWS NEKLMAEVAP 
AYKAMQRRLN EAWAGRQDGL 
VFVNRLKIGM RLQTDPSVIY 
PTRIALPGKA AMDAAAHPSG 
KYILKK* 



10 Further sequence analysis revealed a partial DNA sequence of ORF7ng <SEQ ID 37>: 



15 



20 



25 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



. taccgaatca 
ACTTGCcgaA 
CCTACGTTTT 
CCTTCGGAAG 
GCCGGATTCC 
TGAGGAAAGT 
TGGAGCAATG 
CAATCCTGAA 
GCAGCGATTT 
CTGAACGAGG 
TTATGAAATG 
AGGCCGACCG 
GGTATGCGCC 
ATACAAGGGC 
aCAccTAtac 
Aaggcggcaa 
tttcgtgtcC 
TGACCGAACA 



AGATTGCCAA 
GACCGCATCG 
GGGTGTGCAC 
TGTCTGCTTG 
GTTACCGTGC 
CATCGACGCA 
AAAAACTGAT 
GGGCAGTTTT 
GCAGATTTAC 
CATGGGCAGG 
CTGATTATGG 
CGACCATGTC 
TGCAAACCGA 
AAAATCCGTA 
gggcgggggc 
tggatgccgc 
AAAATGGACG 
CAACGCCGCc 



AAATCAGGGT 
TGTTCAGCAG 
AACAGGCTGC 
GGATATCTTG 
AGATTATCGA 
ACGCCCGACA 
GGCGGAAGTT 
TTCCCGACAG 
CAAACCGCCT 
CAGGCAGGAC 
CGAGCCTGAT 
GCTTCCGTCT 
CCCGTCCGTG 
AAGCCGACCT 
ttgccgccaa 
cgcccacccg 
GCACGGGCTT 
gTcCGCAAAT 



ATTTCGTCGG 
GCATGTTTTG 
ATACGGGGAC 
CAGAAAATGC 
AGGTTCGCGT 
TCGGACACGA 
GCGCCCGATG 
CTACGAAATC 
ACAAGGCGAT 
GGGCTGCCTT 
CGAAAAGGAA 
TCGTCAACCG 
ATTTACGGCA 
GCGCCGCGAC 
cccggattgc 
tccggcgaAa 
GAGCCAGTTC 
ATATTTTGAA 



TCGGCAGGAA 
ACAGCGGCGG 
gTACAGATTG 
GCGGCGGCAG 
TTTTCGCATA 
CACCAAAGGC 
CCTTCAGCGG 
GATGCGGGCG 
GCAACGCCGC 
ATAAAAACCC 
ACGGGGCATG 
CCTGAAAATC 
TGGGTGCGGC 
ACGCCGTACA 
gctgcccggC 
aatacctgTa 
AGCCATGATT 
AAAATAA 



This corresponds to the amino acid sequence <SEQ ID 38; ORF7ng-l>: 



30 



35 



l 

51 
101 
151 
201 
251 



. YRIKIAKNQG 
PSEVSAWDIL 
WSNEKLMAEV 
LNEAWAGRQD 
GMRLQTDPSV 
KAAMDAAAHP 



ISSVGRKLAE 
QKMRGGRPDS 
APDAFSGNPE 
GLPYKNPYEM 
IYGMGAAYKG 
SGEKYLYFVS 



DRIVFSRHVL 
VTVQIIEGSR 
GQFFPDSYEI 
LIMASLIEKE 
KIRKADLRRD 
KMDGTGLSQF 



TAAAYVLGVH 
FSHMRKVIDA 
DAGGSDLQIY 
TGHEADRDHV 
TPYNTYTGGG 
SHDLTEHNAA 



NRLHTGTYRL 
TPDIGHDTKG 
QTAYKAMQRR 
ASVFVNRLKI 
LPPTRIALPG 
VRKYILKK* 



ORF7ng-l and ORF7-1 show 98.0% identity in 298 aa overlap: 



40 



45 



50 



55 



60 



10 20 30 40 50 60 

or f 7-1. pep KLLKWSAVFLTVSAAVFAALLFVPKDNGRAYRIKIAKNQGISSVGRKLAEDRIVFSRHVL 

I I I II I 1 I I I 1 I I I I I I I I I I I I I I I I I I I 
orf7ng-l YRIKIAKNQGISSVGRKLAEDRIVFSRHVL 

10 20 30 

70 80 90 100 110 120 

orf7-l.pep TAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKVIDA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf7ng-l TAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKVIDA 

40 50 60 70 80 90 

130 140 150 160 170 180 

orf7-l.pep TPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLQIYQTAYKAMQRR 

I I I I I t I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i 1 
orf7ng-l TPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLQIYQTAYKAMQRR 

100 110 120 130 140 150 

190 200 210 220 230 240 

orf 7-1 . pep LNEAWESRQDGLPYKNPYEMLIMASLVEKETGHEADRDHVASVFVNRLKIGMRLQTDPSV 
Milt : 1 I II I I I I I I I M I I I I I I : I I I! I I I I I I I I 1 I 1 I I I I I M I I I 11 I I I I M 
orf7ng-l LNEAWAGRQDGLPYKNPYEMLIMASLIEKETGHEADRDHVASVFVNRLKIGMRLQTDPSV 

160 170 180 190 200 210 

250 260 270 280 290 300 

orf 7-1 . pep IYGMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLYFVS 
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I I | | I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I : I I I I I I I I ! I I I I I I I 
orf7nq-l IYGMGAAYKGKIRKADLRRDTPYNTYTGGGLPPTRIALPGKAAMDAAAHPSGEKYLYFVS 
220 230 240 250 260 270 

310 320 330 

or f 7-1 .pep KMDGTGLSQFSHDLTEHNAAVRKYILKKX 
I I I i I I I I I I I I I I I J I I I I I I I I I I M I 
orf7ng-l KMDGTGLSQFSHDLTEHNAAVRKYILKKX 

280 290 



In addition, ORF7ng-l shows significant homology with a hypothetical E.coli protein: 

sp|P28306lYCEG_ECOLI HYPOTHETICAL 38.2 KD PROTEIN IN PABC-HOLB INTERGENIC REGION 
gi 1 1787339 (AE000210) o340; 100% identical to fragment YCEG_ECOLI SW: P28306 but 
has 97 additional C-terminal residues [Escherichia coli] Length = 340 

Score = 79 (36.2 bits), Expect = 5.0e-57, Sum P(2) = 5.0e-57 

Identities = 20/87 (22%), Positives = 40/87 (45%) 

Query: 10 GISSVGRKLAEDRIVFSRHVLTAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPD 69 

G ++G +L D+I+ V + + GTYR +++ ++L+ + G+ 

Sbjct: 4 9 GRLALGEQLYADKIINRPRVFQWLLRIEPDLSHFKAGTYRFTPQMTVREMLKLLESGKEA 108 

Query: 70 SVTVQI IEGSRFSHMRKVIDATPDIGH 96 

++++EG R S K + P I H 
Sbjct: 10 9 QFPLRLVEGMRLSDYLKQLREAPYIKH 135 

Score = 438 (200.7 bits), Expect - 5.0e-57, Sum P(2) - 5.0e-57 
Identities - 84/155 (54%), Positives - 111/155 (71%) 

Query: 120 EGQFFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWAGRQDGLPYKNPYEMLIMASLIEK 179 

EG F+PD++ A +D+ + + A+K M + ++ AW GR DGLPYK+ +++ MAS+IEK 
Sbjct: 158 EGWFWPDTWMYTANTTDVALLKRAHKKMVKAVDSAWEGRADGLPYKDKNQLVTMASIIEK 217 

Query: 180 ETGHEADRDHVASVFVNRLKIGMRLQTDPSVIYGMGAAYKGKIRKADLRRDTPYNTYTGG 239 

ET ++RD VASVF+NRL+IGMRLQTDP+VIYGMG Y GK+ +ADL T YNTYT 
Sbjct: 218 ETAVASERDKVASVFINRLRIGMRLQTDPTVIYGMGERYNGKLSRADLETPTAYNTYTIT 277 

Query: 24 0 GLPPTRIALPGKAAMDAAAHPSGEKYLYFVSKMDG 274 

GLPP IA PG ++ AAAHP+ YLYFV+ G 
Sbjct: 278 GLPPGAIATPGADSLKAAAHPAKTPYLYFVADGKG 312 



Based on this analysis, including the fact that the H.influenzae YCEG protein possesses a possible 
leader sequence, it is predicted that the proteins from TV meningitidis and N.gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 6 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 39>: 

1 CGTTTCAAAA TGTTAACTGT GTTGACGGCA ACCTTGATTG CCGGACAGGT 

51 ATCTGCCGCC GGAGGCGGTG CGGGGGATAT GAAACAGCCG AAGGAAGTCG 

101 GAAAGGTTTT CAGAAAGCAG CAGCGTTACA GCGAGGAAGA AATCAAAAAC 

151 GAACGCGCAC GGCTTGCGGC AGTGGGCGAG CGGGTTAATC AGATATTTAC 

201 GTTGCTGGGA GGGGAAACCG CCTTGCAAAA GGGGCAGGCG GGAACGGCTC 

251 TGGCAACCTA TATGCTGATG TTGGAACGCA CAAAATCCCC CGAAGTCGCC 

301 GAACGCGCCT TGGAAATGGC CGTGTCGCTG AACGCGTTTG AACAGGCGGA 

351 AATGATTTAT CAGAAATGGC GGCAGATTGA GCCTATACCG GGTAAGGCGC 

401 AAAAACGGGC GGGGTGGCTG CGGAACGTGC TGAGGGAAAG AGGAAATCAG 

451 CATCTGGACG GACGGGAAGA AGTGCTGGCT CAGGCGGACG AAGGACAG 

This corresponds to the amino acid sequence <SEQ ID 40; ORF9>: 

1 . . RFKMLTVLTA TLIAGQVSAA GGGAGDMKQP KEVGKVFRKQ QRYSEEEIKN 
51 ERARLAAVGE RVNQIFTLLG GETALQKGQA GTALATYMLM LERTKSPEVA 
101 ERALEMAVSL NAFEQAEMIY QKWRQIEPIP GKAQKRAGWL RNVLRERGNQ 
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151 HLDGREEVLA QADEGQ 

Further sequence analysis revealed the complete DNA sequence <SEQ ID 41>: 

1 ATGTTACCTA ACCGTTTCAA AATGTTAACT GTGTTGACGG CAACCTTGAT 

51 TGCCGGACAG GTATCTGCCG CCGGAGGCGG TGCGGGGGAT ATGAAACAGC 

101 CGAAGGAAGT CGGAAAGGTT TTCAGAAAGC AGCAGCGTTA CAGCGAGGAA 

151 GAAATCAAAA ACGAACGCGC ACGGCTTGCG GCAGTGGGCG AGCGGGTTAA 

201 TCAGATATTT ACGTTGCTGG GAGGGGAAAC CGCCTTGCAA AAGGGGCAGG 

251 CGGGAACGGC TCTGGCAACC TATATGCTGA TGTTGGAACG CACAAAATCC 

301 CCCGAAGTCG CCGAACGCGC CTTGGAAATG GCCGTGTCGC TGAACGCGTT 

351 TGAACAGGCG GAAATGATTT ATCAGAAATG GCGGCAGATT GAGCCTATAC 

401 CGGGTAAGGC GCAAAAACGG GCGGGGTGGC TGCGGAACGT GCTGAGGGAA 

451 AGAGGAAATC AGCATCTGGA CGGACTGGAA GAAGTGCTGG CTCAGGCGGA 

501 CGAAGGACAG AACCGCAGGG TGTTTTTATT GTTGGCACAA GCCGCCGTGC 

551 AACAGGACGG GTTGGCGCAA AAAGCATCGA AAGCGGTTCG CCGCGCGGCG 

601 TTGAAATATG AACATCTGCC CGAAGCGGCG GTTGCCGATG TGGTGTTCAG 

651 CGTACAGGGA CGCGAAAAGG AAAAGGCAAT CGGAGCTTTG CAGCGTTTGG 

701 CGAAGCTCGA TACGGAAATA TTGCCCCCCA CTTTAATGAC GTTGCGTCTG 

751 ACTGCACGCA AATATCCCGA AATACTCGAC GGCTTTTTCG AGCAGACAGA 

801 CACCCAAAAC CTTTCGGCCG TCTGGCAGGA AATGGAAATT ATGAATCTGG 

851 TTTCCCTGCA CAGGCTGGAT GATGCCTATG CGCGTTTGAA CGTGCTGTTG 

901 GAACGCAATC CGAATGCAGA CCTGTATATT CAGGCAGCGA TATTGGCGGC 

951 AAACCGAAAA GAAGGTGCTT CCGTTATCGA CGGCTACGCC GAAAAGGCAT 

1001 ACGGCAGGGG GACGGAGGAA CAGCGGAGCA GGGCGGCGCT AACGGCGGCG 

1051 ATGATGTATG CCGACCGCAG GGATTACGCC AAAGTCAGGC AGTGGCTGAA 

1101 AAAAGTATCC GCGCCGGAAT ACCTGTTCGA CAAAGGTGTG CTGGCGGCTG 

1151 CGGCGGCTGT CGAGTTGGAC GGCGGCAGGG CGGCTTTGCG GCAGATCGGC 

1201 AGGGTGCGGA AACTTCCCGA ACAGCAGGGG CGGTATTTTA CGGCAGACAA 

1251 TTTGTCCAAA ATACAGATGC TCGCCCTGTC GAAGCTGCCC GATAAACGGG 

1301 AGGCTTTGAG GGGGTTGGAC AAGATTATCG AAAAACCGCC TGCCGGCAGT 

1351 AATACAGAGT TACAGGCAGA GGCATTGGTA CAGCGGTCAG TTGTTTACGA 

1401 TCGGCTTGGC AAGCGGAAAA AAATGATTTC AGATCTTGAA AGGGCGTTCA 

1451 GGCTTGCACC CGATAACGCT CAGATTATGA ATAATCTGGG CTACAGCCTG 

1501 CTGACCGATT CCAAACGTTT GGACGAAGGT TTCGCCCTGC TTCAGACGGC 

1551 ATACCAAATC AACCCGGACG ATACCGCTGT CAACGACAGC ATAGGCTGGG 

1601 CGTATTACCT GAAAGGCGAC GCGGAAAGCG CGCTGCCGTA TCTGCGGTAT 

1651 TCGTTTGAAA ACGACCCCGA GCCCGAAGTT GCCGCCCATT TGGGCGAAGT 

1701 GTTGTGGGCA TTGGGCGAAC GCGATCAGGC GGTTGACGTA TGGACGCAGG 

1751 CGGCACACCT TACGGGAGAC AAGAAAATAT GGCGGGAAAC GCTCAAACGT 

1801 CACGGCATCG CATTGCCCCA ACCTTCCCGA AAACCTCGGA AATAA 

This corresponds to the amino acid sequence <SEQ ID 42; ORF9-l>: 



1 MLPNRFKMLT VLTATLIAGQ VSAAGG GAGD MKQPKEVGKV FRKQQRYSEE 

51 EIKNERARLA AVGERVNQIF TLLGGETALQ KGQAGTALAT YMLMLERTKS 

101 PEVAERALEM AVSLNAFEQA EMIYQKWRQI EPIPGKAQKR AGWLRNVLRE 

151 RGNQHLDGLE EVLAQADEGQ NRRVFLLLAQ AAVQQDGLAQ KASKAVRRAA 

201 LKYEHLPEAA VADWFSVQG REKEKAIGAL QRLAKLDTEI LPPTLMTLRL 

251 TARKYPEILD GFFEQTDTQN LSAVWQEMEI MNLVSLHRLD DAYARLNVLL 

301 ERNPNADLYI QAAILAANRK EGASVIDGYA EKAYGRGTEE QRSRAALTAA 

351 MMYADRRDYA KVRQWLKKVS APEYLFDKGV LAAAAAVELD GGRAALRQIG 

401 RVRKLPEQQG RYFTADNLSK IQMLALSKLP DKREALRGLD KIIEKPPAGS 

451 NTELQAEALV QRSWYDRLG KRKKMISDLE RAFRLAPDNA QIMNNLGYSL 

501 LTDSKRLDEG FALLQTAYQI NPDDTAVNDS IGWAYYLKGD AESALPYLRY 

551 SFENDPEPEV AAHLGEVLWA LGERDQAVDV WTQAAHLTGD KKIWRETLKR 

601 HGIALPQPSR KPRK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF9 shows 89.8% identity over a 166aa overlap with an ORF (ORF9a) from strain A of N. 



meningitidis: 

10 20 30 40 50 

orf 9 . pep RFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERARLA 
|| : | : | | : I : I : I I I : II I I : I I I I I i II I I I I I! I I I I I I 1 I I I I I I I I 
orf 9a MLPARFTILSVLAAALLAGQAYAA — GAADAKPPKEVGKVFRKQQRYSEEEIKNERARLA 



WO 99/24578 



-80- 



PCT/IB98/01665 



10 



20 



30 



40 



50 



60 70 80 90 100 110 

orf 9 . pep AVGERVNQI FTLLGGETALQKGQAGTALATYMLMLERTKS PEVAERALEMAVSLNAFEQA 

M I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I i M I I II I I I I I I I 1 I I I I I I i 
orf 9a AVGERVNQI FTLLGXETALQKGQAGTALATYMLMLERTKS PEVAERALEMAVSLNAFEQA 

60 70 80 90 100 110 

120 130 140 150 160 

orf 9 . pep EMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGREEVLAQADEGQ 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
orf 9a EMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGLEEXLAQADEXQNRRVFLLLAQ 
120 130 140 150 160 170 

orf 9a AAVQQDGLAQKASKAVRRAALRYEHLPEAAVADWFSVQXREKEKAIGALQRLAKLDTEI 
180 190 200 210 220 230 



The complete length ORF9a nucleotide sequence <SEQ ID 43> is: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 



ATGTTACCCG 
TGCCGGGCAG 
AAGTCGGAAA 
AAAAACGAAC 
ATTTACGTTG 
CGGCTCTGGC 
GTCGCCGAAC 
GGCGGAAATG 
AGGCGCAAAA 
AATCAGCATC 
ACAGAACCGC 
ACGGGTTGGC 
TATGAACATC 
GGNACGCGAA 
TCGATACGGA 
CGCAAATATC 
AAACCTTTCG 
TGCACAGGCT 
AATCCGAATG 
AAAAGAANGT 
GGGGGACGGG 
TATGCCGACC 
GTCCGCGCCG 
CTGTCGAGTT 
CGGAAACTTC 
CAAAATACAG 
TGAGGGGGTT 
GAGTTACAGG 
TGGCAAGCGG 
CACCCGATAA 
GATTCCAAAC 
AATCAACCCG 
ACCTGAAANG 
GAAAACGACC 
GGCATTGGGC 
ACCTTACGGG 
ATCGCATTGC 



CCCGTTTCAC 
GCGTATGCCG 
GGTTTTCAGA 
GCGCACGGCT 
CTGGGANGGG 
AACCTATATG 
GCGCCTTGGA 
ATTTATCAGA 
ACGGGCGGGG 
TAGACGGACT 
AGGGTGTTTT 
GCAAAAAGCA 
TGCCCGAAGC 
AAGGAAAAGG 
AATATTGCCC 
CCGAAATACT 
GCCGTCTGGC 
GGATGATGCC 
CAGACCTGTA 
GCTTCCGTTA 
GGAACAGCGG 
GAAGGGATTA 
GAATACCTGT 
GGACNGCGGC 
CCGAACAGCA 
ATGTTCGCCC 
GGACAAGATT 
CAGAGGCATT 
AAAAAAATGA 
CGCTCAGATT 
GTTTGGACGA 
GACGATACCG 
CGACGCGGAA 
CCGAGCCCGA 
GAACGCGATC 
AGACAAGAAA 
CCCAACCTTC 



CATTTTATCT 
CCGGCGCGGC 
AAGCAGCAGC 
TGCGGCAGTG 
AAACCGCCTT 
CTGATGTTGG 
AATGGCCGTG 
AATGGCGGCA 
TGGCTGCGGA 
GGAAGAANTG 
TATTGTTGGC 
TCGAAAGCGG 
GGCGGTTGCC 
CAATCGGAGC 
CCCACTTTAA 
CGACGGCTTT 
AGGAAATGGA 
TATGCGCGTT 
TATTCAGGCA 
TCGACGGCTA 
GGCAGGGCGG 
CACCAAAGTC 
TCGACAAAGG 
AGGGCGGCTT 
GGGGCGGTAT 
TGTCGAAGCT 
ATCGAAAAAC 
GGTACAGCGG 
TTTCAGATCT 
ATGAATAATC 
AGGCTTCGCC 
CTGTCAACGA 
AGCGCGCTGC 
AGTTGCCGCC 
AGGCGGTTGA 
ATATGGCGGG 
CCGAAAACCT 



GTGCTCGCGG 
GGATGCGAAG 
GTTACAGCGA 
GGCGAGCGGG 
GCAAAAGGGG 
AACGCACAAA 
TCNCTGAACG 
GATTGAGCCT 
ACGTGCTGAG 
CTGGCTCAGG 
ACAAGCCGCC 
TTCGCCGCGC 
GATGTGGTGT 
TTTGCAGCGT 
TGACGTTGCG 
TTCGAGCAGA 
AATTATGAAT 
TGAACGTGCT 
GCGATATTGG 
CGCCGAAAAG 
CAATGACGGC 
AGGCAGTGGT 
TGTGCTGGCG 
TGCGGCAGAT 
TTTACGGCAG 
GCCCGACAAA 
CGCCTGCCGG 
TCAGTTGTTT 
TGAAAGGGCG 
TGGGCTACAG 
CTGCTTCAGA 
CAGCATAGGC 
CGTATCTGCG 
CATTTGGGCG 
CGTATGGACG 
AAACGCTCAA 
CGGAAATAA 



CAGCCCTGCT 
CCGCCGAAGG 
GGAAGAAATC 
TTAATCAGAT 
CAGGCGGGAA 
ATCCCCCGAA 
CGTTTGAACA 
ATACCGGGTA 
GGAAAGAGGA 
CGGACGAANG 
GTGCAACAGG 
GGCGTTGAGA 
TCAGCGTACA 
TTGGCGAAGC 
TCTGACTGCA 
CAGACACCCA 
CTGGTTTCCC 
GTTGGAACGC 
CGGCAAACCG 
GCATACGGCA 
GGCGATGATA 
TGAAAAAAGT 
GCTGCGGCGG 
CGGCAGGGTG 
ACAATTTGTC 
CGGGAGGCTT 
CAGTAATACA 
ACGATCGGCT 
TTCAGGCTTG 
CCTGCTTTCC 
CGGCATACCA 
TGGGCGTATT 
GTATTCGTTT 
AAGTGTTGTG 
CAGGCGGCAC 
ACGTCACGGC 



This encodes a protein having amino acid sequence <SEQ ID 44>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 



ML PAR FT I LS VLAAALLAGQ AYAAGAADAK 



KNERARLAAV 
VAERALEMAV 
NQHLDGLEEX 
YEHLPEAAVA 
RKYPEILDGF 
NPNADLYIQA 
YADRRDYTKV 
RKLPEQQGRY 
ELQAEALVQR 
DSKRLDEGFA 
ENDPEPEVAA 



GERVNQIFTL 
SLNAFEQAEM 
LAQADEXQNR 
DWFSVQXRE 
FEQTDTQNLS 
AILAANRKEX 
RQWLKKVSAP 
FTADNLSKIQ 
SWYDRLGKR 
LLQTAYQINP 
HLGEVLWALG 



LGXETALQKG 
IYQKWRQIEP 
RVFLLLAQAA 
KEKAIGALQR 
AVWQEMEIMN 
ASVIDGYAEK 
EYLFDKGVLA 
MFALSKLPDK 
KKMISDLERA 
DDTAVNDSIG 
ERDQAVDVWT 



PPKEVGKVFR 
QAGTALATYM 
I PGKAQKRAG 
VQQDGLAQKA 
LAKLDTEILP 
LVSLHRLDDA 
AYGRGTGEQR 
AAAAVELDXG 
REALRGLDKI 
FRLAPDNAQI 
WAYYLKXDAE 
QAAHLTGDKK 



KQQRYSEEEI 
LMLERTKSPE 
WLRNVLRERG 
SKAVRRAALR 
PTLMTLRLTA 
YARLNVLLER 
GRAAMTAAMI 
RAALRQIGRV 
IEKPPAGSNT 
MNNLGYSLLS 
SALPYLRYSF 
IWRETLKRHG 
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601 IALPQPSRKP RK* 



ORF9a and ORF9-1 show 95.3% identity in 614 aa overlap: 



10 



15 



20 



25 



10 20 30 40 50 

orf 9a . pep MLPARFTILSVLAAALLAGQAYAAG— AADAKPPKEVGKVFRKQQRYSEEEIKNERARLA 
Ml II : I : I I : I : I : ] I I : III I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 9-1 MLPNRFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERARLA 

10 20 30 40 50 60 

60 70 80 90 100 110 

orf 9a . pep AVGERVNQIFTLLGXETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 

I I I I I I I II II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I 
orf 9-1 AVGERVNQIFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 

70 80 90 100 110 120 

120 130 140 150 160 170 

orf 9a . pep EMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGLEEXLAQADEXQNRRVFLLLAQ 
I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I II I I I I I I I I 1 I I I I I I I I I I 1,1 I I 
orf 9-1 EMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGLEEVLAQADEGQNRRVFLLLAQ 
130 140 150 160 170 180 

180 190 200 210 220 230 

orf 9a . pep AAVQQDGLAQKASKAVRRAALRYEHLPEAAVADWFSVQXREKEKAIGALQRLAKLDTEI 
I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I i i I 
orf 9-1 AAVQQDGLAQKASKAVRRAALKYEHLPEAAVADWFSVQGREKEKAIGALQRLAKLDTEI 
190 200 210 220 230 240 



30 



240 250 260 270 280 290 

orf 9a . pep LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLHRLDDAYARLNVLL 
I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I II I I I 1 I I I I 
orf 9-1 LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLHRLDDAYARLNVLL 
250 260 270 280 290 300 



35 



40 



45 



50 



55 



300 310 320 330 340 350 

orf 9a . pep ERNPNADLYIQAAILAANRKEXASVIDGYAEKAYGRGTGEQRGRAAMTAAMIYADRRDYT 
I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I : I I I : I I I I : I I I I I 1 I : 
orf 9-1 ERNPNADLYIQAAILAANRKEGASVIDGYAEKAYGRGTEEQRSRAALTAAMMYADRRDYA 
310 320 330 340 350 360 

360 370 380 390 400 410 

orf 9a . pep KVRQWLKKVSAPEYLFDKGVLAAAAAVELDXGRAALRQIGRVRKLPEQQGRYFTADNLSK 
I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I 
or f 9- 1 KVRQWLKKVSAPEYLFDKGVLAAAAAVELDGGRAALRQIGRVRKLPEQQGRYFTADNLSK 
370 380 390 400 410 420 

420 430 440 450 460 470 

orf 9a . pep IQMFALSKLPDKREALRGLDKIIEKPPAGSNTELQAEALVQRSWYDRLGKRKKMISDLE 
I I I : I I II II I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 9-1 IQMLALSKLPDKREALRGLDKIIEKPPAGSNTELQAEALVQRSWYDRLGKRKKMISDLE 
430 440 450 460 470 480 

480 490 500 510 520 530 

orf 9a . pep RAFRLAPDNAQIMNNLGYSLLSDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKXD 

I I I I II I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I II II II I I I I I I I I I I I I II I 
orf 9-1 RAFRLAPDNAQIMNNLGYSLLTDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKGD 
490 500 510 520 530 540 



60 



540 550 560 570 580 590 

orf 9a . pep AESALPYLRYSFENDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLTGDKKIWRETLKR 
I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I 
orf 9-1 AESALPYLRYSFENDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLTGDKKIWRETLKR 
550 560 570 580 590 600 



65 



orf 9a. pep 
orf9-l 



600 610 
HGIALPQPSRKPRKX 

I I I I I I I I I I I I I I I 
HGIALPQPSRKPRKX 
610 



WO 99/24578 PCT/IB98/01665 

-82- 

Homology with a predicted ORF from N. gonorrhoeae 

ORF9 shows 82.8% identity over a 163aa overlap with a predicted ORF (ORF9.ng) from N. 
gonorrhoeae: 

Orf9 RFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERAR 54 

I! : I: II: 1:1: I I I: !! ||:|:: I I I I I I I : I I :: I I II 1 I I I I I I I I 
orf9ng MIMLPARFTILSVLAAALLAGQAYAA — GAADVELPKEVGKVLRKHRRYSEEEIKNERAR 58 

orf9 LAAVGERVNQIFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFE 114 

I I | | | | | | | : : I I I I I II I I I I I I I I I ! II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 
orf9ng LAAVGERVNRVFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFE 118 

orf 9 QAEMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGREEVLAQADEGQ 166 

I I M I I I I I I I I II I I I : I I I IIIMMhl II 111 III 11:1 
orf9ng QAEMIYQKWRQIEPIPGEAQKPAGWLRNVLKEGGNPHLDRLEEVPAQSDYVHQPMIFLLL 178 

The ORF9ng nucleotide sequence <SEQ ID 45> was predicted to encode a protein having including 
acid sequence <SEQ ID 46>: 

1 MIMLPARFTI LSVLAAALLA GQAYAAGA AD VELPKEVGKV LRKHRRYSEE 

51 EIKNERARLA AVGERVNRVF TLLGGETALQ KGQAGTALAT YMLMLERTKS 

101 PEVAERALEM AVSLNAFEQA EMIYQKWRQI EPIPGEAQKP AGWLRNVLKE 

151 GGNPHLDRLE EVPAQSDYVH QP MIFLLLVQ AAVQHGGVA Q KPSKAVRPAA 

201 YNYEVLPETA GADAVFCVQG PQYEKAIQSF PPCGRNPQTE NIAPPFNELF 

251 RPTARPISPK LLQRFFRTEP NLAKPFRPPG PEMETYQTGF PRPLTRNNPT 

Amino acids 1-28 are a putative leader sequence, and 173-189 are predicted to be a transmembrane 
domain. 

Further sequence analysis revealed the complete length ORF9ng DNA sequence <SEQ ID 47>: 

1 ATGTTACCCG CCCGTTTCAC TATTTTATCT GTCCTCGCAG CAGCCCTGCT 

51 TGCCGGACAG GCGTATGCTG CCGGCGCGGC GGATGTGGAG CTGCCGAAGG 

101 AAGTCGGAAA GGTTTTAAGG AAACATCGGC GTTACAGCGA GGAAGAAATC 

151 AAAAACGAAC GCGCACGGCT TGCGGCAGTG GGCGAACGGG TCAACAGGGT 

201 GTTTACGCTG TTGGGCGGTG AAACGGCTTT GCAGAAAGGG CAGGCGGGAA 

251 CGGCTCTGGC AACCTATATG CTGATGTTGG AACGCACAAA ATCCCCCGAA 

301 GTCGCCGAAC GCGCCTTGGA AATGGCCGTG TCGCTGAACG CGTTTGAACA 

351 GGCGGAAATG ATTTATCAGA AATGgcggca gatcgagcct ataCcgggtg 

4 01 aggcgcaaaa accgGcgggG tggctgcgga acgtattgaa ggaagggGGa 

451 aaTCAGCATC TGGAcgggtt gaaagaggTG CtggcgcaAT cggacgatGT 

501 GCAAAAAcgc aggaTATTTT TGCTGCTGGT GCAAGCCGCC GTGCagcagg 

551 gTGGGGTGGC TCAAAAAGCA TCGAAAGCGG TTCGCcgtgc GGcgttgaAG 

601 TATGAACATC TGCCcgaagc ggcggTTGCC GATGcggTGT TCGGCGTACA 

651 GGGACGCGAA AAGGAAAagg caaTCGAAGC TTTGCAGCGT TTGGCGAAGC 

701 TCGATACGGA AATATTGCCC CCCACTTTAA TGACGTTGCG TCTGACTGCA 

751 CGCAAATATC CCGAAATACT CGACGGCTTT TTCGAGCAGA CAGACACCCA 

801 AAACCTTTCG GCCGTCTGGC AGGAAATGGA AATTATGAAT CTGGTTTCCC 

851 TGCGTAAGCC GGATGATGCC TATGCGCGTT TGAACGTGCT GTTGGAACAC 

901 AACCCGAATG CAAACCTGTA TATTCAGGCG GCGATATTGG CGGCAAACCG 

951 AAAAGAAGGT GCGTCCGTTA TCGACGGCTA CGCCGAAAAG GCATACGGCA 

1001 GGGGGACGGG GGAACAGCGG GGCagggcgg cAATgacggc GGCGATGATA 

1051 TATGCCGACC GCAGGGATTA CGCCAAAGTC AGGCAGTGGT TGAAAAAAGT 

1101 GTCCGCGCCG GAATACCTGT TCGACAAAGG CGTGCTGGCG GCTGCGGCGG 

1151 CTGCCGAATT GGACGGAGGC CGGGCGGCTT TGCGGCAGAT CGGCAGGGTG 

1201 CGGAAACTTC CCGAACAGCA GGGGCGGTAT TTTACGGCAG ACAATTTGTC 

1251 CAAAATACAG ATGCTCGCCC TGTCGAAGCT GCCCGACAAA CGGGAAGCCC 

1301 TGATCGGGCT GAACAACATC ATCGCCAAAC TTTCGGCGGC GGGAAGCACG 

1351 GAACCTTTGG CGGAAGCATT GGCACAGCGT TCCATTATTT ACGaacAGTT 

1401 cggCAAACGG GGAAAAATGA TTGCCGACCT tgaAACcgcg CTCAAACTTA 

1451 CGCCCGATAA TGCACAAATT ATGAATAATC TGGGCTACAG CCTGCTTTCC 

1501 GATTCCAAAC GTTTGGACGA GGGTTTCGCC CTGCTTCAGA CGGCATACCA 

1551 AATCAACCCG GACGATACCG CCGTTAACGA CAGCATAGGC TGGGCGTATT 

1601 ACCTGAAAGG CGACgcggaA AGCGCGCTGC CGTATCTGcg gtattcgttt 

1651 gAAAACGACC CCGAGCCCGA AGTTGCCGCC CATTTGGGCG AAGTGTTGTG 
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1701 GGCATTGGGC GAACGCGATC AGGCGGTTGA CGTATGGACG CAGGCGGCAC 
1751 ACCTTAGGGG AGACAAGAAA ATATGGCGGG AGACGCTCAA ACGCTACGGA 
1801 ATCGCCTTGC CCGAGCCTTC CCGAAAACCC CGGAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 48>: 



10 



15 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



MLPARFTILS VLAAALLAGQ AYAAGAADVE 



KNERARLAAV 
VAERALEMAV 
NQHLDGLKEV 
YEHLPEAAVA 
RKYPEILDGF 
NPNANLYIQA 
YADRRDYAKV 
RKLPEQQGRY 
EPLAEALAQR 
DSKRLDEGFA 
ENDPEPEVAA 
IALPEPSRKP 



GERVNRVFTL 
SLNAFEQAEM 
LAQSDDVQKR 
DAVFGVQGRE 
FEQTDTQNLS 
AILAANRKEG 
RQWLKKVSAP 
FTADNLSKIQ 
SIIYEQFGKR 
LLQTAYQINP 
HLGEVLWALG 
RK* 



LGGETALQKG 
IYQKWRQIEP 
RIFLLLVQAA 
KEKAIEALQR 
AVWQEMEIMN 
ASVIDGYAEK 
EYLFDKGVLA 
MLALSKLPDK 
GKMIADLETA 
DDTAVNDSIG 
ERDQAVDVWT 



LPKEVGKVLR 
QAGTALATYM 
IPGEAQKPAG 
VQQGGVAQKA 
LAKLDTEILP 
LVSLRKPDDA 
AYGRGTGEQR 
AAAAAELDGG 
REALIGLNNI 
LKLTPDNAQI 
WAYYLKGDAE 
QAAHLRGDKK 



ORF9ng and ORF9-1 show 88.1% identity in 614 aa overlap: 



20 



25 



30 



35 



40 



KHRRYSEEEI 
LMLERTKSPE 
WLRNVLKEGG 
SKAVRRAALK 
PTLMTLRLTA 
YARLNVLLEH 
GRAAMTAAMI 
RAALRQIGRV 
IAKLSAAGST 
MNNLGYSLLS 
SALPYLRYSF 
IWRETLKRYG 



10 20 30 40 50 60 

orf 9-1 . pep MLPNRFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERARLA 
III || : I : I I : I : I : I I I : Ml I : I : : I I I I I I I : I I : : I I I I I I I I I I I I I I I 
orf9ng-l MLPARFTILSVLAAALLAGQAYAAG — AADVELPKEVGKVLRKHRRYSEEEIKNERARLA 

10 20 30 40 50 

70 80 90 100 110 120 

orf 9-1 . pep AVGERVNQI FTLLGGETALQKGQAGTALATYMLMLERTKS PEVAERALEMAVSLNAFEQA 

I I I I I I I :: I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf9ng-l AVGE RVNRV FT LLGGETALQKGQAGTALATYMLMLERTKS PEVAERALEMAVSLNAFEQA 

60 70 80 90 100 110 

130 140 150 160 170 180 

orf 9-1 . pep EMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGLEEVLAQADEGQNRRVFLLLAQ 
I I I I I I I II I I I I 1 I : I I I 11111111:1 I I I I I I I I : I I I I I : I : I : I I : I I I I : I 
orf9ng-l EMIYQKWRQIEPIPGEAQKPAGWLRNVLKEGGNQHLDGLKEVLAQSDDVQKRRIFLLLVQ 
120 130 140 150 160 170 

190 200 210 220 230 240 

or f 9-1 . pep AAVQQDGLAQKASKAVRRAALKYEHLPEAAVADWFSVQGREKEKAIGALQRLAKLDTEI 
I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I : I I : I I I I I I I I I I I I I I I I I I I I I I 
orf9ng-l AAVQQGGVAQKASKAVRRAALKYEHLPEAAVADAVFGVQGREKEKAIEALQRLAKLDTEI 
180 190 200 210 220 230 



45 



50 



55 



60 



65 



250 260 270 280 290 300 

orf 9-1 . pep LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLHRLDDAYARLNVLL 
I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II : : I II I I I I I I I I 
orf9ng-l LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLRKPDDAYARLNVLL 
240 250 260 270 280 290 

310 320 330 340 350 360 

orf 9-1 . pep ERNPNADLYIQAAILAANRKEGASVIDGYAEKAYGRGTEEQRSRAALTAAMMYADRRDYA 
I : I I I I : I I I 1 II I I 1 II II I I I I I i I I I I I I I II I II I I I : I I I : I I I I : I I I I I I I I 
orf9ng-l EHNPNANLYIQAAILAANRKEGASVIDGYAEKAYGRGTGEQRGRAAMTAAMIYADRRDYA 
300 310 320 330 340 350 

370 380 390 400 410 420 

orf 9-1 . pep KVRQWLKKVSAPEYLFDKGVLAAAAAVELDGGRAALRQIGRVRKLPEQQGRYFTADNLSK 

I I I I I I I I I I I I I I I I I I M I I I I I I : I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 9ng~l KVRQWLKKVSAPEYLFDKGVLAAAAAAELDGGRAALRQIGRVRKLPEQQGRYFTADNLSK 
360 370 380 390 400 410 

430 440 450 460 470 480 

orf 9-1 . pep IQMLALSKLPDKREALRGLDKIIEKPPAGSNTELQAEALVQRSWYDRLGKRKKMISDLE 
I I I I I M I I I I II I I I ll::ll I M I I : I I I : : I : : : I II I M : II I 

orf9ng-l IQMLALSKLPDKREALIGLNNIIAKLSAAGSTEPLAEALAQRSIIYEQFGKRGKMIADLE 
420 430 440 450 460 470 



490 



500 



510 



520 



530 



540 
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601 IALPQPSRKP RK* 

ORF9a and ORF9-1 show 95.3% identity in 614 aa overlap: 

10 20 30 40 50 

5 orf 9a . pep MLPARFTILSVLAAALLAGQAYAAG — AADAKPPKEVGKVFRKQQRYSEEEIKNERARLA 

' I I I II : I : I I : I : I : I I I : I I I I : I I I I I I I I I I I I I I I I I I II I I I I I I I I I 

orf 9-1 MLPNRFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERARLA 

10 20 30 40 50 60 

10 60 70 80 90 100 110 

or f 9a . pep AVGERVNQIFTLLGXETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 
I I I I ! I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
O r f 9 - 1 AVGERVNQI FTLLGGETALQKGQAGTALATYMLMLERTKS PEVAERALEMAVSLNAFEQA 

70 80 90 100 110 120 

15 

120 130 140 150 160 170 

orf 9a . pep EMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGLEEXLAQADEXQNRRVFLLLAQ 

I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I 
orf 9-1 EMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGLEEVLAQADEGQNRRVFLLLAQ 

20 130 140 150 160 170 180 

180 190 200 210 220 230 

orf 9a . pep AAVQQDGLAQKASKAVRRAALRYEHLPEAAVADWFSVQXREKEKAIGALQRLAKLDTEI 

II I I I I I I I I I I I II I I I I II : I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I 
25 orf 9-1 AAVQQDGLAQKASKAVRRAALKYEHLPEAAVADWFSVQGREKEKAIGALQRLAKLDTEI 

190 200 210 220 230 240 

240 250 260 270 280 290 

orf 9a . pep LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLHRLDDAYARLNVLL 
30 ~ I I I I I II I I I I I I II I I I I I I I I I II I I I I I M I I I I I I I I I II I I I I I I I I I I I I I I I I 

orf 9-1 LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLHRLDDAYARLNVLL 
250 260 270 280 290 300 

300 310 320 330 340 350 

35 orf 9a . pep ERNPNADLYIQAAILAANRKEXASVIDGYAEKAYGRGTGEQRGRAAMTAAMIYADRRDYT 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I : I I I : I I I I : I I I I I I I = 
orf 9-1 ERNPNADLYIQAAILAANRKEGASVIDGYAEKAYGRGTEEQRSRAALTAAMMYADRRDYA 

310 320 330 340 350 360 

40 360 370 380 390 400 410 

orf 9a . pep KVRQWLKKVSAPEYLFDKGVLAAAAAVELDXGRAALRQIGRVRKLPEQQGRYFTADNLSK 
I I M I I I I I I I II I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I II I I II I II I I I I I 
orf 9-1 KVRQWLKKVSAPEYLFDKGVLAAAAAVELDGGRAALRQIGRVRKLPEQQGRYFTADNLSK 

370 380 390 400 410 420 

45 

420 430 440 450 460 470 

orf 9a . pep IQMFALSKLPDKREALRGLDKIIEKPPAGSNTELQAEALVQRSWYDRLGKRKKMISDLE 

I I I : I I I II II I I I II I I I I I II I I I II I I I I I I I I I I I I I II II I I I I I I I I I II I I I I 
orf 9-1 IQMLALSKLPDKREALRGLDKIIEKPPAGSNTELQAEALVQRSWYDRLGKRKKMISDLE 

50 430 440 450 460 470 480 

480 490 500 510 520 530 

orf 9a . pep RAFRLAPDNAQIMNNLGYSLLSDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKXD 

II I I I I I I I I I I I I I I I I II I : I I I II I I I I I I I I I I I I I I I I I I I I I II I M I II ! I I 
55 orf 9-1 RAFRLAPDNAQIMNNLGYSLLTDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKGD 

490 500 510 520 530 540 

540 550 560 570 580 590 

or f 9a . pep AESALPYLRYSFENDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLTGDKKIWRETLKR 
60 * I I I I I M I I I I I I I I II I I I I I I I I I I I I I I II I I I I I II I II I I I I I I I I I M I I I I I I 

orf 9-1 AESALPYLRYSFENDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLTGDKKIWRETLKR 

550 560 570 580 590 600 

600 610 
65 orf 9a. pep HGIALPQPSRKPRKX 

M I I I I I I I I I I I I I 
orf 9-1 HGIALPQPSRKPRKX 

610 
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Homology with a predicted ORF from N.zonorrhoeae 

ORF9 shows 82.8% identity over a 163aa overlap with a predicted ORF (ORF9.ng) from N. 
gonorrhoeae: 

Orf9 RFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERAR 54 

II : I : I I : I : I : I I I : II I I : I:: I I I I I I I : I I : : I I I I I I I I I I I I I 
orf9ng MIMLPARFTILSVLAAALLAGQAYAA — GAADVELPKEVGKVLRKHRRYSEEEIKNERAR 58 

orf9 LAAVGERVNQI FTLLGGETALQKGQAGTALATYMLMLERTKS PEVAERALEMAVSLNAFE 114 

Mill MM::MIMMIMI!]IMIIIIMIIIM lllllil Mill MIIIIMII 
or f 9ng LAAVGE RVNRV FT LLGGETALQKGQAGTALATYMLMLERTKS PEVAERALEMAVSLNAFE 118 

orf 9 QAEMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGREEVLAQADEGQ 166 

I I I I I I I I I I I I I I II I : I I I i I I I I I I I : I II III III I I : I 
orf9ng QAEMIYQKWRQIEPIPGEAQKPAGWLRNVLKEGGNPHLDRLEEVPAQSDYVHQPMIFLLL 178 

The ORF9ng nucleotide sequence <SEQ ID 45> was predicted to encode a protein having including 
acid sequence <SEQ ID 46>: 

1 MIMLPARFTI LSVLAAALLA GQAYAAGA AD VELPKEVGKV LRKHRRYSEE 

51 EIKNERARLA AVGERVNRVF TLLGGETALQ KGQAGTALAT YMLMLERTKS 

101 PEVAERALEM AVSLNAFEQA EMIYQKWRQI EPIPGEAQKP AGWLRNVLKE 

151 GGNPHLDRLE EVPAQSDYVH QP MIFLLLVQ AAVQHGGVA Q KPSKAVRPAA 

201 YNYEVLPETA GADAVFCVQG PQYEKAIQSF PPCGRNPQTE NIAPPFNELF 

251 RPTARPISPK LLQRFFRTEP NLAKPFRPPG PEMETYQTGF PRPLTRNNPT 

Amino acids 1-28 are a putative leader sequence, and 173-189 are predicted to be a transmembrane 
domain. 



Further sequence analysis revealed the complete length ORF9ng DNA sequence <SEQ ID 47>: 

1 ATGTTACCCG CCCGTTTCAC TATTTTATCT GTCCTCGCAG CAGCCCTGCT 

51 TGCCGGACAG GCGTATGCTG CCGGCGCGGC GGATGTGGAG CTGCCGAAGG 

101 AAGTCGGAAA GGTTTTAAGG AAACATCGGC GTTACAGCGA GGAAGAAATC 

151 AAAAACGAAC GCGCACGGCT TGCGGCAGTG GGCGAACGGG TCAACAGGGT 

201 GTTTACGCTG TTGGGCGGTG AAACGGCTTT GCAGAAAGGG CAGGCGGGAA 

251 CGGCTCTGGC AACCTATATG CTGATGTTGG AACGCACAAA ATCCCCCGAA 

301 GTCGCCGAAC GCGCCTTGGA AATGGCCGTG TCGCTGAACG CGTTTGAACA 

351 GGCGGAAATG ATTTATCAGA AATGgcggca gatcgagcct ataCcgggtg 

401 aggcgcaaaa accgGcgggG tggctgcgga acgtattgaa ggaagggGGa 

451 aaTCAGCATC TGGAcgggtt gaaagaggTG CtggcgcaAT cggacgatGT 

501 GCAAAAAcgc aggaTATTTT TGCTGCTGGT GCAAGCCGCC GTGCagcagg 

551 gTGGGGTGGC TCAAAAAGCA TCGAAAGCGG TTCGCcgtgc GGcgttgaAG 

601 TATGAACATC TGCCcgaagc ggcggTTGCC GATGcggTGT TCGGCGTACA 

651 GGGACGCGAA AAGGAAAagg caaTCGAAGC TTTGCAGCGT TTGGCGAAGC 

701 TCGATACGGA AATATTGCCC CCCACTTTAA TGACGTTGCG TCTGACTGCA 

751 CGCAAATATC CCGAAATACT CGACGGCTTT TTCGAGCAGA CAGACACCCA 

801 AAACCTTTCG GCCGTCTGGC AGGAAATGGA AATTATGAAT CTGGTTTCCC 

851 TGCGTAAGCC GGATGATGCC TATGCGCGTT TGAACGTGCT GTTGGAACAC 

901 AACCCGAATG CAAACCTGTA TATTCAGGCG GCGATATTGG CGGCAAACCG 

951 AAAAGAAGGT GCGTCCGTTA TCGACGGCTA CGCCGAAAAG GCATACGGCA 

1001 GGGGGACGGG GGAACAGCGG GGCagggcgg cAATgacggc GGCGATGATA 

1051 TATGCCGACC GCAGGGATTA CGCCAAAGTC AGGCAGTGGT TGAAAAAAGT 

1101 GTCCGCGCCG GAATACCTGT TCGACAAAGG CGTGCTGGCG GCTGCGGCGG 

1151 CTGCCGAATT GGACGGAGGC CGGGCGGCTT TGCGGCAGAT CGGCAGGGTG 

1201 CGGAAACTTC CCGAACAGCA GGGGCGGTAT TTTACGGCAG ACAATTTGTC 

1251 CAAAATACAG ATGCTCGCCC TGTCGAAGCT GCCCGACAAA CGGGAAGCCC 

1301 TGATCGGGCT GAACAACATC ATCGCCAAAC TTTCGGCGGC GGGAAGCACG 

1351 GAACCTTTGG CGGAAGCATT GGCACAGCGT TCCATTATTT ACGaacAGTT 

1401 cggCAAACGG GGAAAAATGA TTGCCGACCT tgaAACcgcg CTCAAACTTA 

1451 CGCCCGATAA TGCACAAATT ATGAATAATC TGGGCTACAG CCTGCTTTCC 

1501 GATTCCAAAC GTTTGGACGA GGGTTTCGCC CTGCTTCAGA CGGCATACCA 

1551 AATCAACCCG GACGATACCG CCGTTAACGA CAGCATAGGC TGGGCGTATT 

1601 ACCTGAAAGG CGACgcggaA AGCGCGCTGC CGTATCTGcg gtattcgttt 

1651 gAAAACGACC CCGAGCCCGA AGTTGCCGCC CATTTGGGCG AAGTGTTGTG 
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1701 GGCATTGGGC GAACGCGATC AGGCGGTTGA CGTATGGACG CAGGCGGCAC 
1751 ACCTTAGGGG AGACAAGAAA ATATGGCGGG AGACGCTCAA ACGCTACGGA 
1801 ATCGCCTTGC CCGAGCCTTC CCGAAAACCC CGGAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 48>: 



10 



15 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



MLPARFTILS VLAAALLAGQ AYAAGAADVE 



KNERARLAAV 
VAERALEMAV 
NQHLDGLKEV 
YEHLPEAAVA 
RKYPEILDGF 
NPNANLYIQA 
YADRRDYAKV 
RKLPEQQGRY 
EPLAEALAQR 
DSKRLDEGFA 
ENDPEPEVAA 
IALPEPSRKP 



GERVNRVFTL 
SLNAFEQAEM 
LAQSDDVQKR 
DAVFGVQGRE 
FEQTDTQNLS 
AILAANRKEG 
RQWLKKVSAP 
FTADNLSKIQ 
SIIYEQFGKR 
LLQTAYQINP 
HLGEVLWALG 
RK* 



LGGETALQKG 
IYQKWRQIEP 
RIFLLLVQAA 
KEKAIEALQR 
AVWQEMEIMN 
ASVIDGYAEK 
EYLFDKGVLA 
MLALSKLPDK 
GKMIADLETA 
DDTAVNDSIG 
ERDQAVDVWT 



LPKEVGKVLR 
QAGTALATYM 
I PGEAQKPAG 
VQQGGVAQKA 
LAKLDTEILP 
LVSLRKPDDA 
AYGRGTGEQR 
AAAAAELDGG 
REALIGLNNI 
LKLTPDNAQI 
WAYYLKGDAE 
QAAHLRGDKK 



KHRRYSEEEI 
LMLERTKSPE 
WLRNVLKEGG 
SKAVRRAALK 
PTLMTLRLTA 
YARLNVLLEH 
GRAAMTAAMI 
RAALRQIGRV 
IAKLSAAGST 
MNNLGYSLLS 
SALPYLRYSF 
IWRETLKRYG 



ORF9ng and ORF9-1 show 88.1% identity in 614 aa overlap: 



20 
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30 



35 



10 20 30 40 50 60 

orf 9-1 . pep MLPNRFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERARLA 
III II :J:II:|:|:IM: Ml |:|:: I I II I I I : II :: I I I I I I I I I I II I I I 
orf9ng-l MLPARFTILSVLAAALLAGQAYAAG — AADVELPKEVGKVLRKHRRYSEEEIKNERARLA 

10 20 30 40 50 

70 80 90 100 110 120 

orf 9-1 . pep AVGERVNQI FTLLGGETALQKGQAGTALATYMLMLERTKS PEVAERALEMAVSLNAFEQA 

I I I I I I I : : I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 
o r f 9 ng - 1 AVGERVNRVFT LLGGET ALQKGQAGTALAT YMLMLERTKS PEVAERALEMAVSLNAFEQA 

60 70 80 90 100 110 

130 140 150 160 170 180 

orf 9-1 . pep EM IYQKWRQIEP I PGKAQKRAGWLRNVLRERGNQHLDGLEEVLAQADEGQNRRVFLLLAQ 
I I I I I I I I I I I I 1 I I : II I 11111111:1 11111111:11111:1: hlhlllhl 
orf9ng-l EMIYQKWRQIEPIPGEAQKPAGWLRNVLKEGGNQHLDGLKEVLAQSDDVQKRRIFLLLVQ 
120 130 140 150 160 170 



40 



190 200 210 220 230 240 

orf 9-1 . pep AAVQQDGLAQKASKAVRRAALKYEHLPEAAVADWFSVQGREKEKAIGALQRLAKLDTEI 
Mill I : I I I I I I I I I I I I I I I II 1 I I t I I I I : H : I M I I I I I I I MINIMUM 
orf9ng-l AAVQQGGVAQKASKAVRRAALKYEHLPEAAVADAVFGVQGREKEKAIEALQRLAKLDTEI 
180 190 200 210 220 230 



45 



250 260 270 280 290 300 

orf 9-1 . pep LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLHRLDDAYARLNVLL 
II I I I II I I I II II M I I I I I I I I I I II II I I I I I I I I I I I ! I I I I : : I I I II II I I I I 
orf9ng-l LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLRKPDDAYARLNVLL 
240 250 260 270 280 290 



50 



55 



60 



65 



310 320 330 340 350 360 

orf 9-1 . pep ERNPNADLYIQAAILAANRKEGASVIDGYAEKAYGRGTEEQRSRAALTAAMMYADRRDYA 
M I M I : I M I I M M II I M I I I I I II II II M M II II I : II I: II I I : II II II I I 
orf9ng-l EHNPNANLYIQAAILAANRKEGASVIDGYAEKAYGRGTGEQRGRAAMTAAMIYADRRDYA 
300 310 320 330 340 350 

370 380 390 400 410 420 

orf 9-1 . pep KVRQWLKKVSAPEYLFDKGVLAAAAAVELDGGRAALRQIGRVRKLPEQQGRYFTADNLSK 
II II I I I I II I I I I I I II I I I I M I I : I I M I I I I II I I I II I II I I I I I II I I I I II I 1 
orf9ng-l KVRQWLKKVSAPEYLFDKGVLAAAAAAELDGGRAALRQIGRVRKLPEQQGRYFTADNLSK 
360 370 380 390 400 410 

430 440 450 460 470 480 

orf 9-1 . pep IQMLALSKLPDKREALRGLDKIIEKPPAGSNTELQAEALVQRSWYDRLGKRKKMISDLE 
I I I I I I I I I I I I I I I I 11 : : I I I I : : : I I I M I : M I : : I : : : I M M I : I I I 
orf9ng-l IQMLALSKLPDKREALIGLNNIIAKLSAAGSTEPLAEALAQRSIIYEQFGKRGKMIADLE 
420 430 440 450 460 470 



490 500 510 520 530 540 
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orf 9-1 . pep RAFRLAPDNAQIMNNLGYSLLTDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKGD 
I : : I : I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I [ I I I I I I 
orf9ng-l TALKLTPDNAQIMNNLGYSLLSDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKGD 
480 490 500 510 520 530 
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70 



550 560 570 580 590 600 

or f 9-1 . pep AESALPYLRYSFENDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLTGDKKIWRETLKR 
I | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf9ng-l AESALPYLRYSFENDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLRGDKKIWRETLKR 
540 550 560 570 580 590 

610 • • 

orf 9-1 . pep HGIALPQPSRKPRKX 
: [ I I I I : I I I I i I I I 
orf9ng-l YGIALPEPSRKPRKX 
600 610 

In addition, ORF9ng shows significant homology with a hypothetical protein from P.aeruginosa: 

Sp | P42810 | YHE3_PSEAE HYPOTHETICAL 64.8 KD PROTEIN IN HEMM-HEMA INTERGENIC REGION 
(ORF3) 

>gi|1072999|pir| IS49376 hypothetical protein 3 - Pseudomonas aeruginosa >gi|557259 
(X82071) orf3 [Pseudomonas aeruginosa] Length = 576 
Score = 128 bits (318), Expect - le-28 

Identities = 138/587 (23%), Positives = 228/587 (38%), Gaps - 125/587 (21%) 

Query: 67 VFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQAEMIYQKWR 126 

+++LL E A Q+ + AL+ Y++ ++T+ P V+ERA +A L A ++A W 
Sbjct: 53 LYSLLVAELAGQRNRFDIALSNYWQAQKTRDPGVSERAFRIAEYLGADQEALDTSLLWA 112 

Query: 127 QIEPIPGEAQKPAG WLRNVLKEGGNQHLDGLKEVLAQSDDVQKRRI 172 

+ P +AQ+ A ++ VL G+ H D L A++D + + 

Sbjct: 113 RSAPDNLDAQRAAAIQLARAGRYEESMVYMEKVLNGQGDTHFDFLALSAAETDPDTRAGL 172 

Query: 173 FXXXXXXXXXXXXXXXKASKAVRRAALKYEHLPEAAVADAVFGVQGREKEKAIEALQRLA 232 

++ KY + + A+ Q ++A+ L+ + 

Sbjct: 173 L QS FDHLLKKYPNNGQLLFGKALLLQQDGRPDEALTLLEDNS 214 

Query: 233 KLDTE I LP PTLMTLRLTARK YPE I LDGFFEQTDTQNLS AVWQEME IMNLVSLRKP 287 

E+PL+L + K P+GED + + + + LV + 
Sbjct: 215 ASRHEVAPLLLRSRLLQSMKRSDEALPLLKAGIKEHPDDKRVRLAYARL LVEQNRL 270 

Query: 288 DDAYARLNVLLEHNPN ANLYIQAAI 312 

DDA A L++ P+ A +Y++ + 

Sbjct: 271 DDAKAEFAGLVQQFPDDDDDLRFSLALVCLEAQAWDEARIYLEELVERDSHVDAAHFNLG 330 

Query: 313 -LAANRKEGASVIDGYAEKAYGRGTGEQRGRAAMTAAMIYADRRDYAKVRQWLKKVSAPE 371 

LA +K+ A +D YA+ GG + T++ARDAR + P+ 

Sbjct: 331 RLAEEQKDTARALDEYAQ — VGPGNDFLPAQLRQTDVLLKAGRVDEAAQRLDKARSEQPD 388 

Query: 372 YLFDKXXXXXXXXXXXXXXXXXXRQIGRVRKLPEQQGRYFTADNLSKIQMLALSKLPDKR 431 

Y A L 1+ ALS + 

Sbjct: 389 Y AIQLYLIEAEALSNNDQQE 408 

Query: 432 EALIGLNNIIAKLSAAGSTEPLAEALAQRSIIYEQFGKRGKMIADLETALKLTPDNAQIM 4 91 

+A + + + ELL RS++ E+ +M DL + PDNA + 

Sbjct: 409 KAWQAIQEGLKQYP EDL-NLLYTRSMLAEKRNDLAQMEKDLRFVIAREPDNAMAL 4 62 

Query: 492 NNLGYSLLSDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKGDAESALPYLRYSFE 551 

N LGY+L + R E L+ A+++NPDD A+ DS+GW Y +G A YLR + + 
Sbjct: 463 NALGYTLADRTTRYGEARELILKAHKLNPDDPAILDSMGWINYRQGKLADAERYLRQALQ 522 

Query: 552 NDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLRGDKKIWRETLKR 598 

P+ EVAAHLGEVLWA G+A+W+ +D+R T+KR 
Sbjct: 523 RYPDHEVAAHLGEVLWAQGRQGDARAIWREYLDKQPDSDVLRRTIKR 569 



gi 1 2983399 (AE000710) hypothetical protein [Aquifex aeolicus) Length ■ 
Score =81.5 bits (198), Expect = le-14 

Identities - 61/198 (30%), Positives - 98/198 (48%), Gaps - 19/198 (9%) 



545 



Query: 408 GRYFTADNL-SKIQMLALSKLPDKREALIGLNNIIAKLSAAGSTEPLAEALAQ- 
G Y A L K ++LA PDK+E L + +K + + L + 



459 
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Sbjct: 


"3 O C 

335 


rMVPABVDT TITVaVWT A — — DnWl? T T CT l?7i nWQTf TTfnYnVZiT T?TT WJ PVnVDMnOD 

i hjUAls.KLilbJxAKVLA"'"' ~~cu)\]\e*±LjE LttiRUi 1 oJ\l r\V I UJxM.JjlljlIaJ\J\lit 1 KlJl FNUoK 


o y u 


Query: 


460 


RSIIYEQFGKRGKMIADLETALKLTPDNAQIMNNLGYSLLS — DSKRLDEGFALLQ 


513 




+I+Y+ G L A++L P+N N LGYSLL +R++E L++ 




Sbjct : 




WVCMra TT7VnMT PnTTfNia'B'VZiT PTfZiTJTT nDtT'M'DnVVMYT.^V^T.T.T MYfJVTTD\7TTI?2il?T?T TV 
Vic MrjfiX V I UIN Jj^UXJtxIN/iJlJ^^Jbrvrvi^ rU I I W I Jjo I o JjliJjW I \a^Zti\v LtL>j\£j£jU±}S 




yUc L y • 


S1 4 

-J X 1 


TAYQINPDDTAWDSIGWAYYLKGDAESALPYLRYSF-ENDPEPEVAAHLGEVLWALGER 


572 




A + +P++ A DS+GW YYLKGD E A+ YL + E +P V H+G+VL +G + 




Sbjct: 


451 


KALEKDPENPAYIDSMGWVYYLKGDYERAMQYLLKALREAYDDPWNEHVGDVLLKMGYK 


510 


Query: 


573 


DQAVDVWTQAAHLRGDKK 590 








++A + + +A L + K 




Sbjct: 


511 


EEARNYYERALKLLEEGK 528 





Based on this analysis, it is predicted that the proteins from Nmeningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 7 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 49>: 

1 AACCTCTACG CCGGCCCGCA GACCACATCC GTCATCGCAA ACATCGCCGA 

51 CAACCTGCAA CTGGCCAAAG ACTACGGCAA AGTACACTGG TTCGCCTCCC 

101 CGCTCTTCTG GCTCCTGAAC CAACTGCACA ACATCATCGG CAACTGGGGC 

151 TGGGCGATTA TCGTTTTAAC CATCATCGTC AAAGCCGTAC TGTATCCATT 

201 GACCAACGCC TCTTACCGCT CTATGGCGAA AATGCGTGCC GCCGCACCCA 

251 AACTGCAAGC CATCAAAGAG AAATACGGCG ACGACCGTAT GGCGCAACAA 

301 CAGGCGATGA TGCAGCTTTA CACAGACGAG AAAATCAACC CG^CTGGGCG 

351 GCTGCCTGCC TATGCTGTTG CAAATCCCCG TCTTCATCGG ATTGTATTGG 

401 GCATTGTTCG CCTCCGTAGA ATTGCGCCAG GCACCTTGGC TGGGTTGGAT 

451 TACCGACCTC AGCCGCGCCG ACCCCTACTA CATCCTGCCC ATCATTATGG 

501 CGGCAACGAT GTTCGCCCAA ACTTATCTGA ACCCGCCGCC GAcCGACCCG 

551 ATGCagGCGA AAATGATGAA AATCATGCCG TTGGTTTTCT CsGwCrTGTT 

601 CTTCTTCTTC CCTGCCGGks TGGTATTGTA CTGGGTAGTC AACAACCTCC 

651 TGACCATCGC CCAGCAATGG CACATCAACC GCAGCATCGA AAAACAACGC 

701 GCCCAAGGCG AAGTCGTTTC CTAA 

This corresponds to the amino acid sequence <SEQ ID 50; ORF1 1>: 

1 ..NLYAGPQTTS VIANIADNLQ LAKDYGKVHW FASPLFWLLN QLHNIIGNWG 
51 W AIIVLTIIV KAVLYPLT NA SYRSMAKMRA AAPKLQAIKE KYGDDRMAQQ 
101 QAMMQLYTDE KINPLGGCLP MLLQIPVFIG LYWALFA SVE LRQAPWLGWI 
151 TDLSRADPYY ILPIIMAATM FAQTYLNPPP TDPMQAKMMK IMP LVFSXXF 
201 FFFPAGXVLY WWNNLLTIA QQWHINRSIE KQRAQGEWS * 

Further sequence analysis revealed the complete DNA sequence <SEQ ID 51>: 

1 ATGGATTTTA AAAGACTCAC GGCGTTTTTC GCCATCGCGC TGGTGATTAT 

51 GATCGGCTGG GAAAAGATGT TCCCCACTCC GAAGCCAGTC CCCGCGCCCC 

101 AACAGGCAGC ACAACAACAG GCCGTAACCG CTTCCGCCGA AGCCGCGCTC 

151 GCGCCCGCAA CGCCGATTAC CGTAACGACC GACACGGTTC AAGCCGTCAT 

201 TGATGAAAAA AGCGGCGACC TGCGCCGGCT GACCCTGCTC AAATACAAAG 

251 CAACCGGCGA CGAAAATAAA CCGTTCATCC TGTTTGGCGA CGGCAAAGAA 

301 TACACCTACG TCGCCCAATC CGAACTTTTG GACGCGCAGG GCAACAACAT 

351 TCTAAAAGGC ATCGGCTTTA GCGCACCGAA AAAACAGTAC AGCTTGGAAG 

401 GCGACAAAGT TGAAGTCCGC CTGAGCGCGC CTGAAACACG CGGTCTGAAA 

451 ATCGACAAAG TTTATACTTT CACCAAAGGC AGCTATCTGG TCAACGTCCG 

501 CTTCGACATC GCCAACGGCA GCGGTCAAAC CGCCAACCTG AGCGCGGACT 

551 ACCGCATCGT CCGCGACCAC AGCGAACCCG AGGGTCAAGG TTACTTTACC 

601 CACTCTTACG TCGGCCCTGT TGTTTATACC CCTGAAGGCA ACTTCCAAAA 

651 AGTCAGCTTT TCCGACTTGG ACGACGATGC CAAATCCGGC AAATCCGAGG 

701 CCGAATACAT CCGCAAAACC CCGACCGGCT GGCTCGGCAT GATTGAACAC 

751 CACTTCATGT CCACCTGGAT TCTCCAACCT AAAGGCAGAC AAAGCGTTTG 

801 CGCCGCAGGC GAGTGCAACA TCGACATCAA ACGCCGCAAC GACAAGCTGT 

851 ACAGCACCAG CGTCAGCGTG CCTTTAGCCG CCATCCAAAA CGGCGCGAAA 

901 GCCGAAGCCT CCATCAACCT CTACGCCGGC CCGCAGACCA CATCCGTCAT 

951 CGCAAACATC GCCGACAACC TGCAACTGGC CAAAGACTAC GGCAAAGTAC 
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1001 ACTGGTTCGC CTCCCCGCTC TTCTGGCTCC TGAACCAACT GCACAACATC 

1051 ATCGGCAACT GGGGCTGGGC GATTATCGTT TTAACCATCA TCGTCAAAGC 

1101 CGTACTGTAT CCATTGACCA ACGCCTCTTA CCGCTCTATG GCGAAAATGC 

1151 GTGCCGCCGC ACCCAAACTG CAAGCCATCA AAGAGAAATA CGGCGACGAC 

1201 CGTATGGCGC AACAACAGGC GATGATGCAG CTTTACACAG ACGAGAAAAT 

1251 CAACCCGCTG GGCGGCTGCC TGCCTATGCT GTTGCAAATC CCCGTCTTCA 

1301 TCGGATTGTA TTGGGCATTG TTCGCCTCCG TAGAATTGCG CCAGGCACCT 

1351 TGGCTGGGTT GGATTACCGA CCTCAGCCGC GCCGACCCCT ACTACATCCT 

1401 GCCCATCATT ATGGCGGCAA CGATGTTCGC CCAAACTTAT CTGAACCCGC 

1451 CGCCGACCGA CCCGATGCAG GCGAAAATGA TGAAAATCAT GCCGTTGGTT 

1501 TTCTCCGTCA TGTTCTTCTT CTTCCCTGCC GGTCTGGTAT TGTACTGGGT 

1551 AGTCAACAAC CTCCTGACCA TCGCCCAGCA ATGGCACATC AACCGCAGCA 

1601 TCGAAAAACA ACGCGCCCAA GGCGAAGTCG TTTCCTAA 

This corresponds to the amino acid sequence <SEQ ID 52; ORF1 1-1>: 



1 MDFKRLTAFF AIALVIMIGW EKMFPTPKPV PAPQQAAQQQ AVTASAEAAL 

51 APATPITVTT DTVQAVIDEK SGDLRRLTLL KYKATGDENK PFILFGDGKE 

101 YTYVAQSELL DAQGNNILKG IGFSAPKKQY SLEGDKVEVR LSAPETRGLK 

151 IDKVYTFTKG SYLVNVRFDI ANGSGQTANL SADYRIVRDH SEPEGQGYFT 

201 HSYVGPWYT PEGNFQKVSF SDLDDDAKSG KSEAEYIRKT PTGWLGMIEH 

251 HFMSTWILQP KGRQSVCAAG ECNIDIKRRN DKLYSTSVSV PLAAIQNGAK 

301 AEASINLYAG PQTTSVIANI ADNLQLAKDY GKVHWFASPL FWLLNQLHNI 

351 IGNWGW AIIV LTIIVKAVLY PLT NASYRSM AKMRAAAPKL QAIKEKYGDD 

4 01 RMAQQQAMMQ LYTDEKINPL GGCLP MLLQI PVFIGLYWAL FA SVELRQAP 

451 WLGWITDLSR ADPYYILPII MAATMFAQTY LNPPPTDPMQ AKMMKIMPLV 

501 FSVMFFFFPA GLVLY WWNN LLTIAQQWHI NRSIEKQRAQ GEWS* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a 60kDa inner-membrane protein (accession P25754) of Pseudomonas putida 
ORF1 1 and the 60kDa protein show 58% aa identity in 229 aa overlap (BLASTp). 



ORFll 2 LYAGPQTTSVIANIADNLQLAKDYGKVHWFASPLFWLLNQLHNIIGNWGWAIIVLTIIVK 61 

LYAGP+ S + ++ L+L DYG + + A P+FWLL +H+++GNWGW+IIVLT+++K 
60K 324 LYAGPKIQSKLKELSPGLELTVDYGFLWFIAQPIFWLLQHIHSLLGNWGWSIIVLTMLIK 383 

ORFll 62 AVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRXXXXXXXXXLYTDEKINPLGGCLPM 121 

+ +PL+ ASYRSMA+MRA APKL A+KE++GDDR LY EKINPLGGCLP+ 

60K 384 GLFFPLSAASYRSMARMRAVAPKLAALKERFGDDRQKMSQAMMELYKKEKINPLGGCLPI 4 43 

ORFll 122 LLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTYLNPPPT 181 

L+Q+PVF+ LYW L SVE+RQAPW+ WITDLS DP++ILPIIM ATMF Q LNP P 
60K 444 LVQMPVFLALYWVLLESVEMRQAPWILWITDLSIKDPFFILPIIMGATMFIQQRLNPTPP 503 

ORFll 182 DPMQAKMMKIMPLVXXXXXXXXPAGXVLYWVVNNLLTIAQQWHINRSIE 230 

DPMQAK+MK+MP++ PAG VLYWWNN L+I+QQW+I R IE 

60K 504 DPMQAKVMKMMPIIFTFFFLWFPAGLVLYWVVNNCLSISQQWYITRRIE 552 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF1 1 shows 97.9% identity over a 240aa overlap with an ORF (ORF1 la) from strain A ofN. 
meningitidis: 



10 20 30 

orf 11 . pep NLYAGPQTTSVIANIADNLQLAKDYGKVHW 

I I I I I I I I I I I I I I I ! I I I I I I II I I I I I 
orf 11a IKRRNDKLYSTSVSVPLAAIQNGAKSXASINLYAGPQTTSVIANIADNLQLXKDYGKVHW 
280 290 300 310 320 330 



40 50 60 70 80 90 

orf 11 . pep FAS PLFWLLNQLHN I IGNWGWAI IVLT I IVKAVLYPLTNAS YRSMAKMRAAAPKLQAIKE 
I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 11a FAS PL FWLLNQLHNI IGNWGWAI IVLT I IVKAVLYPLTNAS YRSMAKMRAAAPKLQAIKE 

340 350 360 370 380 390 
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100 110 120 130 140 150 

orf 11 . pep KYGDDRMAQQQAMMQLYTDEKINPLGGCLPMLLQIPVFIGLYWALFASVELRQAPWtGWI 
I I I I II I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 11a KYGDDRMAQQQAMMQLYTDEKINPLGGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWI 
400 410 420 430 440 450 

160 170 180 190 200 210 

orf 11 . pep TDLSRADPYYILPIIMAATMFAQTYLNPPPTDPMQAKMMKIMPLVFSXXFFFFPAGXVLY 
I | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I Mill I I I I III 
orf 11a TDLSRADPYYILPIIMAATMFAQTYLNPPPTDPMQAKMMKIMPLVXSXXFFXFPAGLVLY 
460 470 480 490 500 510 



220 230 240 

orf 11 . pep WWNNLLTIAQQWHINRSIEKQRAQGEWSX 
I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 11a WVINNLLTIAQQWHINRSIEKQRAQGEWSX 
520 530 540 

The complete length ORF1 la nucleotide sequence <SEQ ID 53> is: 



1 


ANGGATTTTA 


51 


GATCGGATNG 


101 


AACAGACGGC 


151 


GCGCCCGNAN 


201 


TGATGAAAAA 


251 


CAACCGGCGA 


301 


TACACCTACN 


351 


TCTAAAAGGC 


401 


GCGACAAAGT 


451 


ATCGACAAAG 


501 


CTTCGACATC 


551 


ACCGCATCGT 


601 


CACTCTTACG 


651 


AGTCAGCTTC 


701 


CCGAATACAT 


751 


CACTTCATGT 


801 


CGCCGCTGGC 


851 


ACAGCACCAG 


901 


TCCNAAGCCT 


951 


CGCAAACATC 


1001 


ACTGGTTCGC 


1051 


ATCGGCAACT 


1101 


CGTACTGTAT 


1151 


GTGCCGCCGC 


1201 


CGTATGGCGC 


1251 


CAACCCGCTG 


1301 


TCGGATTGTA 


1351 


TGGCTGGGTT 


1401 


GCCCATCATT 


1451 


CGCCGACCGA 


1501 


NTNTCNNNNA 


1551 


GATCAACAAC 


1601 


TCGAAAAACA 



AAAGACTCAC 
NAAANGATGT 
ACAACAACAG 
CGCCGATTAC 
AGCGGCGACC 
CNAAAATAAA 
TCGCCCANTC 
ATCGGCTTTA 
TGAAGTCCGC 
TTTATACTTT 
GCCAACGGCA 
CCGCGACCAC 
TCGGCCCTGT 
TCCGACTTGG 
CCGCAAAACC 
CCACCTGGAT 
GACTGCNGTA 
CGTCAGCGTG 
CCATCAACCT 
GCCGACAACC 
CTCCCCCCTC 
GGGGCTGGGC 
CCATTGACCA 
GCCCAAACTG 
AGCAACAAGC 
GGCGGCTGCC 
TTGGGCATTG 
GGATTACCGA 
ATGGCGGCAA 
CCCGATGCAG 
NGTTCTTCNN 
CTCCTGACCA 
ACGCGCCCAA 



NGNGTTTTTC 
TCCCCACTCC 
GCCGTAANCG 
CGTAACGACC 
TGCGCCGGCT 
CCGTTCATCC 
CGAACTTTTG 
GCGCACCGAA 
CTGAGCGCAC 
CACCAAAGGC 
GCGGTCAAAC 
AGCGAACCCG 
TGTTTATACC 
ACGACGATGC 
CNGACCGGCT 
CCTCCAACCC 
TNGACATCAA 
CCTTTAGCCG 
CTACGCCGGC 
TGCAACTGGN 
TTTTGGCTTT 
GATTATCGTT 
ACGCCTCTTA 
CAAGCCATCA 
CATGATGCAG 
TGCCTATGCT 
TTCGCCTCCG 
CCTCAGCCGC 
CGATGTTCGC 
GCGAAAATGA 
CTTCCCTGCC 
TCGCCCAGCA 
GGCGAAGTCG 



GCCATCGCAC 
GAAGCCCGTC 
CTTCCGCCGA 
GACACGGTTC 
GACCCTGCTC 
TGTTTGGCGA 
GACGCGCAGG 
AAAACAGTAC 
CTGAAACACG 
AGCTATCTGG 
CGCCAACCTG 
AGGGTCAAGG 
CCTGAAGGCA 
CAANTCCGGN 
GGCTCGGCAT 
AAAGGCGGAC 
ACGCCGCAAC 
CTATCCAAAA 
CCACAGACCA 
CAAAGACTAC 
TGAACCAACT 
TTAACCATCA 
CCGTTCGATG 
AAGAGAAATA 
CTTTACACAG 
GTTGCAAATC 
TAGAATTGCG 
GCCGACCCNT 
CCAAACCTAT 
TGAAAATCAT 
GGTCTGGTAT 
ATGGCACATC 
TTTCCTAA 



TGGTGATTAT 
CCCGCGCCCC 
AGCCGCGCTC 
AAGCCGTCAT 
AAATACAAAG 
CGGCAAANAA 
GCAACAACAT 
AGCTTGGAAG 
CGGTCTGAAA 
TCAACGTCCG 
AGCGCGGACT 
CTACTTTACC 
ACTTCCAAAA 
AAATCCGAGG 
GATTGAACAC 
AAAGCGTTTG 
GACAAGCTGT 
CGGTGCGAAA 
CATCNGTTAT 
GGCAAAGTAC 
GCACAACATC 
TCGTCAAAGC 
GCGAAAATGC 
CGGCGACGAC 
ACGAGAAAAT 
CCCGTCTTCA 
CCAGGCACCT 
ACTACATCCT 
CTGAACCCGC 
GCCTTTGGTT 
TGTACTGGGT 
AACCGCAGCA 



This encodes a protein having amino acid sequence <SEQ ID 54>: 



1 XDFKRLTXFF AIALVIMIGX XXMFPTPKPV PAPQQTAQQQ AVXASAEAAL 

51 APXXPITVTT DTVQAVIDEK SGDLRRLTLL KYKATGDXNK PFILFGDGKX 

101 YTYXAXSELL DAQGNNILKG IGFSAPKKQY SLEGDKVEVR LSAPETRGLK 

151 IDKVYTFTKG SYLVNVRFDI ANGSGQTANL SADYRIVRDH SEPEGQGYFT 

201 HSYVGPWYT PEGNFQKVSF SDLDDDAXSG KSEAEYIRKT XTGWLGMIEH 

251 HFMSTWILQP KGGQSVCAAG DCXXDIKRRN DKLYSTSVSV PLAAIQNGAK 

301 SXASINLYAG PQTTSVIANI ADNLQLXKDY GKVHWFASPL FWLLNQLHNI 

351 IGNWGW AIIV LTIIVKAVLY PLT NASYRSM AKMRAAAPKL QAIKEKYGDD 

401 RMAQQQAMMQ LYTDEKINPL GGCLP MLLQI PVFIGLYWAL FA SVELRQAP 

451 WLGWITDLSR ADPYYILPII MAATMFAQTY LNPPPTDPMQ AKMMKIMPLV 

501 XSXXFFXFPA GLVLYW VINN LLTIAQQWHI NRSIEKQRAQ GEWS* 

ORF1 la and ORF1 1-1 show 95.2% identity in 544 aa overlap: 

10 20 30 40 50 60 
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10 



15 



20 



25 



30 



35 



40 



45 



50 



orf lla . pep XDFKRLTXFFAIALVIMIGXXXMFPTPKPVPAPQQTAQQQAVXASAEAAIAPXXPITVTT 
I I I I I I I I I I I I I I I I I I I I I I I II I I I I I : I I I I I I : I J I I I I I I I : I I I I I I 
orf 11-1 MDFKRLTAFFAIALVIMIGWEKMFPTPKPVPAPQQAAQQQAVTASAEAALAPATPITVTT 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f lla . pep DTVQAVIDEKSGDLRRLTLLKYKATGDXNKPFILFGDGKXYTYXAXSELLDAQGNNILKG 
II I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I III I I I I I I I I I I I I I I I 
orf 11-1 DTVQAVIDEKSGDLRRLTLLKYKATGDENKPFILFGDGKEYTYVAQSELLDAQGNNILKG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf lla . pep IGFSAPKKQYSLEGDKVEVRLSAPETRGLKIDKVYTFTKGSYLVNVRFDIANGSGQTANL 
I I I I I I I I I I I II I I I I II I I I I I II I I II I I I I I I I I I II I I I I I I I I I I I I I I I I II I 
orf 11-1 IGFSAPKKQYSLEGDKVEVRLSAPETRGLKIDKVYTFTKGSYLVNVRFDIANGSGQTANL 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf lla. pep SADYRIVRDHSEPEGQGYFTHSYVGPWYTPEGNFQKVSFSDLDDDAXSGKSEAEYIRKT 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I 
orf 11-1 SADYRIVRDHSEPEGQGYFTHSYVGPWYTPEGNFQKVSFSDLDDDAKSGKSEAEYIRKT 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf lla. pep XTGWLGMIEHHFMSTWILQPKGGQSVCAAGDCXXDIKRRNDKLYSTSVSVPLAAIQNGAK 
I I I I I I I I I 1 I I I I I I I I I I I 1111111:1 M I I I I I I I I I I I I I I I I I I I I I I LI 
orf 11-1 PTGWLGMIEHHFMSTWILQPKGRQSVCAAGECNIDIKRRNDKLYSTSVSVPLAAIQNGAK 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf lla . pep SXASINLYAGPQTTSVIANIADNLQLXKDYGKVHWFAS PLFWLLNQLHNI IGNWGWAI IV 

: I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II 
orf 11-1 AEAS INLYAGPQTTSVIAN IADNLQLAKDYGKVHWFAS PLFWLLNQLHNI IGNWGWAI I V 

310 320 330 340 350 360 

370 380 390 400 410 420 

or f 1 la . pep LTI IVKAVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRMAQQQAMMQLYTDEKINPL 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 11-1 LTIIVKAVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRMAQQQAMMQLYTDEKINPL 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf lla. pep GGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTY 

M I I II I I M I I I I I I I I I II M 1 M i I I I I M I If I I I I I I I I i I I I M I II I I 

orf 11-1 GGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTY 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf lla. pep LNPPPTDPMQAKMMKIMPLVXSXXFFXFPAGLVLYWVINNLLTIAQQWHINRSIEKQRAQ 
I I I I I I II I I I I I II I I II I I II I I I I I I I II 1 : I I I I II I I II I I I I II II I I I I 
orf 11-1 LNPPPTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWVVNNLLTIAQQWHINRSIEKQRAQ 

490 500 510 520 530 540 



55 



orflla.pep 
orfll-1 



GEWSX 
I I I I I I 
GEWSX 



60 Homology with a predicted ORF from N. gonorrhoeae 

ORF1 1 shows 96.3% identity over a 240aa overlap with a predicted ORF (ORF1 l.ng) from N. 
gonorrhoeae: 



65 



Orf 11 NLYAGPQTTS VI AN I ADNLQLAKDYGKVHW FAS PLFWLLNQLHNI IGNWGWAI I VLT 57 

I | II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I: I I I 
orf ling MAVNLYAGPQTTSVIAN IADNLQLAKDYGKVHWFAS PLFWLLNQLHNI IGNWGWAI WLT 60 
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orfll IIVKAVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRMAQQQAMMQLYTDEKINPLGG 117 

I I I I I I I I I [ ! I I I I I I I I I I I I I I I : I I : I I I I I I I I I I I I I I I I I I I : 11:111111 

orfllng IIVKAVLYPLTNASYRSMAKMRAAAPELQTIKEKYGDDRMAQQQAMMQLFEDEEINPLGG 120 

orfll CLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTYLN 177 

I I I f I I I I I I I I I I I I I I I 1 I I I I 1 I I I I I I I I I I I II I I 1 I 1 I I I I I I i I I I I I I I I I I 

orfllng CLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTYLN 180 

orfll PPPTDPMQAKMMKIMPLVFSXXFFFFPAGXVLYWWNNLLTIAQQWHINRSIEKQRAQGE 237 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orfllng PPPTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWWNNLLTIAQQWHINRSIEKQRAQGE 240 



orfll WS 240 

I I I 



orfllng WS 243 

An ORF1 lng nucleotide sequence <SEQ ID 55> was predicted to encode a protein having amino 
acid sequence <SEQ ID 56>: 



1 MAVNLYAGPQ TTSVIANIAD NLQLAKDYGK VHWFASPLFW LLNQLHNIIG 

51 NWG WAIWLT IIVKAVLYPL T NASYRSMAK MRAAAPELQT IKEKYGDDRM 

101 AQQQAMMQLF EDEEINPLGG CLP MLLQIPV FIGLYWALFA SVELRQAPWL 

151 GWITDLSRAD PYYILPIIMA ATMFAQTYLN PPPTDPMQAK MMKIMP LVFS 

201 VMFFFFPAGL VLY WWNNLL TIAQQWHINR SIEKQRAQGE WS* 

Further sequence analysis revealed the complete gonococcal DNA sequence <SEQ ID 57> to be: 



1 ATGGATTTTA AAAGACTCAC 

51 GATCGGCTGG GAAAAAATGT 

101 AACAGGCGGC ACAAAAACAG 

151 GCGCCCGCAA CGCCGATTAC 

201 TGATGAAAAA AGTGGCGACC 

251 CAACCGGCGA CGAAAACAAA 

301 TACACCTACG TCGCCCAATC 

351 TCTGAAAGGC ATCGGCTTTA 

401 GCGACACAGT CGAAGTCCGC 

451 ATCGACAAAG TCTATACCTT 

501 CTTCGACATC GCCAACGGCA 

551 ACCGCATCGT CCGCGACCAC 

601 CACTCTTACG TCGGCCCTGT 

651 AGTCAGCTTC TCCgacTTgg 

701 ccgaatacaT CCGCAAAACC 

751 cacttcatgt ccacctggat 

801 cgcccaggga gactgccgta 

851 acagcgcaag cgtcagcgtg 

901 aaaccgaaaa tggcggTCAA 

951 TATCGCAAAC ATCGCcgacA 

1001 TACACTGGTT CGCATCGCCG 

1051 ATTATCGGCA ACTGGGGCTG 

1101 AGCCGTACTG TATCCATTGA 

1151 TGCGTGccgc cgcacCcaaA 

1201 GACCGTATGG CGCAACAGCA 

1251 AATCAACCCG CTGGGCGGCT 

1301 TCATCGGCTT GTACTGGGCA 

1351 CCTTGGCTGG GCTGGATTAC 

1401 CCTGCCCATC ATTATGGCGG 

1451 CGCCGCCGAC CGACCCGATG 

1501 GTTTTCTCCG TCATGTTCTT 

1551 GGTGGTCAAC AACCTCCTGA 

1601 GCATCGAAAA ACAACGCGCC 

This encodes a protein having amino acic 



GGCGTTTTTC GCCATCGCGC TGGTGATTAT 
TCCCCACCCC GAAACCCGTC CCCGCGCCCC 
GCAGCAACCG CTTCCGCCGA AGCCGCGCTC 
CGTAACGACC GACACGGTTC AAGCCGTTAT 
TGCGCCGGCT GACCCTGCTC AAATACAAAG 
CCGTTCGTCC TGTTTGGCGA CGGCAAAGAA 
CGAACTTTTG GACGCGCAGG GCAACAACAT 
GCGCACCGAA AAAACAGTAC ACCCTCAACG 
CTGAGCGCGC CCGAAACCAA CGGACTGAAA 
TACCAAAGAC AGCTATCTGG TCAACGTCCG 
GCGGTCAAAC CGCCAACCTG AGCGCGGACT 
AGCGAACCCG AGGGTCAAGG CTACTTTACC 
TGTTTATACC CCTGAAGGCA ACTTCCAAAA 
acgACGATGC gaaaTccggc aaATccgagg 
ccgaccggtt ggctcggcat gattgaacac 
cctccAAcct aaaggcggcc aaaacgtttg 
tcgacattaa aCgccgcaac gacaagctgt 
cctttaaccg ctatcccaac ccgggggeca 
CCTGTATGCC GGTCCGCAAA CCACATCCGT 
ACCTGCAACT GGCAAAAGAC TACGGTAAAG 
CTCTTCTGGC TCCTGAACCA ACTGCACAAC 
GGCAATCGTC GTTTTGACCA TCATCGTCAA 
CCAACGcctc ctACCGTTCG ATGGCGAAAA 
CTGCAGACCA TCAAAGAAAA ATAcgGCGAC 
AGCGATGATG CAGCTTTACA AAgacgAGAA 
GTctgcctat gctgttgCAA ATCCCCGTCT 
TTGTTCGCCT CCGTAGAATT GCGCCAGGCA 
CGACCTCAGC CGCGCCGACC CCTACTACAT 
CAACGATGTT CGCCCAAACC TATCTGAACC 
CAGGCGAAAA TGATGAAAAT CATGCCGTTG 
CTTCTTCCCT GCCGGTTTGG TTCTCTACTG 
CCATCGCCCA GCAGTGGCAC ATCAACCGCA 
CAAGGCGAAG TCGTTTCCTA A 

sequence <SEQ ID 58; ORF1 lng-l>: 



1 MDFKRLTAFF AIALVIMIGW EKMFPTPKPV PAPQQAAQKQ AATASAEAAL 

51 APATPITVTT DTVQAVIDEK SGDLRRLTLL KYKATGDENK PFVLFGDGKE 

101 YTYVAQSELL DAQGNNILKG IGFSAPKKQY TLNGDTVEVR LSAPETNGLK 

151 IDKVYTFTKD SYLVNVRFDI ANGSGQTANL SADYRIVRDH SEPEGQGYFT 

201 HSYVGPWYT PEGNFQKVSF SDLDDDAKSG KSEAEYIRKT PTGWLGMIEH 

251 HFMSTWILQP KGGQNVCAQG DCRIDIKRRN DKLYSASVSV PLTAIPTRGP 

301 KPKWAVNLYA GPQTTSVIAN IADNLQLAKD YGKVHWFASP LFWLLNQLHN 
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351 IIGNWGW AIV VLTIIVKAVL YPLT NASYRS MAKMRAAAPK LQTIKEKYGD 

4 01 DRMAQQQAMM QLYKDEKINP LGGCLP MLLQ IPVFIGLYWA LFA SVELRQA 

451 PWLGWITDLS RADPYYILPI IMAATMFAQT YLNPPPTDPM QAKMMKIMPL 

501 VFSVMFFFFP AGLVLYW WN NLLTIAQQWH INRSIEKQRA QGEWS* 

5 ORF1 lng-1 and ORF1 1-1 shown 95.1% identity in 546 aa overlap: 



10 



15 



orfllng-l. pep 
orfll-1 



orfllng-l.pep 
orfll-1 



10 20 30 40 50 60 

MD FKRLT AFFAI ALV I MIGWEKM FPT PKPVPAPQQAAQKQAAT AS AEAALAPAT P IT VTT 
I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I : I I: I I I I I I I I I I I I I I I I I I 
MDFKRLTAFFAIALVIMIGWEKMFPTPKPVPAPQQAAQQQAVTASAEAALAPATPITVTT 

10 20 30 40 50 60 

70 80 90 100 110 120 

DTVQAVIDEKSGDLRRLTLLKYKATGDENKPFVLFGDGKEYTYVAQSELLDAQGNNILKG 
I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I : I I I I I I I I I II II I I II I I I I I I I I II 
DTVQAVIDEKSGDLRRLTLLKYKATGDENKPFILFGDGKEYTYVAQSELLDAQGNNILKG 

70 80 90 100 110 120 



20 



orfllng-l .pep 
orfll-1 



130 140 150 160 170 180 

IGFSAPKKQYTLNGDTVEVRLSAPETNGLKIDKVYTFTKDSYLVNVRFDIANGSGQTANL 
I I I I I I I I I I : I : I I 1 I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II 
IGFSAPKKQYSLEGDKVEVRLSAPETRGLKIDKVYTFTKGSYLVNVRFDIANGSGQTANL 

130 140 150 160 170 180 



25 



30 



35 



40 



orf llng-l .pep 



orfll-1 



orfllng-l .pep 



orfll-1 



190 200 210 220 230 240 

SADYRIVRDHSEPEGQGYFTHSYVGPWYTPEGNFQKVSFSDLDDDAKSGKSEAEYIRKT 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I II I I I I I I I I 
SADYRIVRDHSEPEGQGYFTHSYVGPWYTPEGNFQKVSFSDLDDDAKSGKSEAEYIRKT 

190 200 210 220 230 240 

250 260 270 280 290 300 

PTGWLGMIEHHFMSTWILQPKGGQNVCAQGDCRIDIKRRNDKLYSASVSVPLTAIPTRGP 
I I I I I 1 I I I I I I I I I I I II t I I 1:111 hi I I I I I I I I I I I I : I I M M : I I : I 
PTGWLGMIEHHFMSTWILQPKGRQSVCAAGECNIDIKRRNDKLYSTSVSVPLAAIQN-GA 

250 260 270 280 290 



310 320 330 340 350 360 

orfllng-l . pep KPKMAVNLYAGPQTTSVIANIADNLQLAKDYGKVHWFASPLFWLLNQLHN IIGNWGWAIV 
I : : : I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I : 
orfll-1 KAEAS INLYAGPQTTSVI ANIADNLQLAKDYGKVHWFAS PLFWLLNQLHN I IGNWGWAI I 

300 310 320 330 340 350 



45 



370 380 390 400 410 420 

orfllng-l . pep VLTIIVKAVLYPLTNASYRSMAKMRAAAPKLQTIKEKYGDDRMAQQQAMMQLYKDEKINP 
I I I I I I II I I II I I I II I I I I I I I I II I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfll-1 VLTIIVKAVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRMAQQQAMMQLYTDEKINP 
360 370 380 390 400 410 



50 



430 440 450 460 470 480 

orfllng-l . pep LGGCLPMLLQI PVFIGLYWALFASVELRQAPWLGWITDLSRADPYY I LPI IMAATMFAQT 
I I I I I II I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I M II I I I I I II I II I I I I I 
orfll-1 LGGCLPMLLQI PVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPI IMAATMFAQT 

420 430 440 450 460 470 



55 



60 



490 500 510 520 530 540 

orfllng-l , pep YLNPPPTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWWNNLLTIAQQWHINRSIEKQRA 
I I I I I I I I I I I I I I I 1 I II I I I I I I I I I I I I II II I I I II I I I I I I I I I I I I I 1 I I I I I I 
orfll-1 YLNPPPTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWWNNLLTIAQQWH INRSIEKQRA 

480 490 500 510 520 530 



orfllng-l . pep 
orfll-1 



QGEWSX 
I I I I I I I 
QGEWSX 
540 



65 In addition, ORF1 lng-1 shows significant homology with an inner-membrane protein from the 



database (accession number p25754): 
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ID 60IM_PSEPU STANDARD; PRT; 560 AA. 

AC P25754; 

DT 01-MAY-1992 (REL. 22, CREATED) 

DT 01-MAY-1992 (REL. 22, LAST SEQUENCE UPDATE) 

DT 01-NOV-1995 (REL. 32, LAST ANNOTATION UPDATE) 

DE 60 KD INNER-MEMBRANE PROTEIN. . . . 



SCORES Initl: 1074 Initn: 1293 Opt: 1103 

Smith-Waterman score: 1406; 41.5% identity in 574 aa overlap 



10 20 30 40 

orfllng-l.pep MDFKR LTAFFAIALVTMIGW EKMFPT PKPVPAPQQAAQKQ 

11:11 ::|: ::: I::: I : :|| I 111 :::|: : 

p25754 MDIKRTILIAALAVVSYVMVLKWNDDYGQAALPTQNTAASTVAPGLPDGVPAGNNGASAD 

10 20 30 40 50 60 

50 60 70 80 90 

orf llng-1 . pep AATASAEAALAPATPIT VTTDTVQAVTDEKSGDLRRLTLLKYKATGDE-NKPF 

: : I : t I : : I :|:: I ||::: :!l :||: :|:| II hill 

p25754 VPSANAESSPAELAPVALSKDLIRVKTDVLELAIDPVGGDIVQLNLPKYPRRQDHPNIPF 

70 80 90 100 110 120 



100 110 120 130 140 

orf llng-1 . pep VLFGDGKEYTYVAQSELLDAQGNNILKGIG FSAPKKQYTL-NGD TVEVRLSAPE 

I I : I I : I : I I I I : : I : : : I : : I : I : I I : I : : I : : : : I 

p25754 QLFDNGGERVYLAQSGLTGTDGPDA-RASGRPLYAAEQKSYQLADGQEQLWDLKFS 

130 140 150 160 170 



150 160 170 180 190 200 

orfllng-l.pep TNGLKIDKVYTFTKDSYLVNVRFDIANGSGQTANLSADYRIVRDHS-EPEGQGYF-THSY 

II:: | : : I : I : I I : I I i I I : I : : : | | I : I : : I : I 
p25754 DNGVNYIKRFSFKRGEYDLNVSYLIDNQSGQAWNGNMFAQLKRDASGDPSSSTATGTATY 

180 190 200 210 220 230 



210 220 230 240 250 260 

orf llng-1 . pep VGPWYTPEGNFQKVSFSDLDDDAKSGKSEAEYIRKTPTGWLGMIEHHFMSTWILQPKGG 
: I : : : I : : I I I : : I : I I : : : I : : I I : : : : I : I : : : I I I : 

p25754 LGAALWTASEPYKKVSMKDID KGSLKE NV S GGW VAWLQHY FVTAWI - PAKS D 

240 250 260 270 280 



270 280 290 300 310 320 

orf llng-1 . pep QNVCAQGDCRIDIKRRNDKLYSASVSVPLTAIPTRGPKPKMAVTJLYAGPQTTSVIANIAD 

: | | :::::: I : : | : : : | : | | : : : | | | | | : | : : : : 

p25754 NNV VQTRKDSQGNYI IGYTGPVI SVPA-GGKVETSALLYAGPKIQSKLKELS P 

290 300 310 320 330 



330 340 350 360 370 380 

orf llng-1 . pep NLQLAKDYGKVHWF-ASPLFWLLNQLHNIIGNWGWAIWLTIIVKAVLYPLTNASYRSMA 
: I : I : II I : II I : I : I I I I : : : I : : : I I I I I : I : I I I : : : I : : : : I I : I I II II I 
p25754 GLELTVDYGFL-WFIAQPIFWLLQHIHSLLGNWGWSIIVLTMLIKGLFFPLSAASYRSMA 
340 350 360 370 380 390 



390 400 410 420 430 440 

orfllng-l.pep KMRAAAPKLQTIKEKYGDDRMAQQQAMMQLYKDEKINPLGGCLPMLLQIPVFIGLYWALF 
: I I I : I I II : : I I : : II I I : : : I I I I : I I I I II I I I I I I I I : I : I : I I I : : I I I : I : 
p25754 RMRAVAPKLAALKERFGDDRQKMSQAMMELYKKEKINPLGGCLPILVQMPVFLALYWVLL 
400 410 420 430 440 450 



450 460 470 480 490 500 

orf llng-1 . pep ASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTYLNPPPTDPMQAKMMKIMPLVF 
111:11111: I I 1 I t I I I : : I I I I i I : I I M I III I I I I I I I : I I : I I : : I 
p25754 ESVEMRQAPWILWITDLSIKDPFFILPIIMGATMFIQQRLNPTPPDPMQAKVMKMMPIIF 
460 470 480 490 500 510 



510 520 530 540 

orf llng-1 . pep SVMFFFFPAGLVLYWWNNLLTIAQQWHINRSIEKQRAQGEWSX 

: : I : : 1 I I I I I I II I I I I I : I : I I I t I : I II 
p25754 TFFFLWFPAGLVLYWWNNCLSISQQWYITRRIEAATKKAAA 
520 530 540 550 560 
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Based on this analysis, including the homology to an inner-membrane protein from P. putida and 
the predicted transmembrane domains (seen in both the meningococcal and gonoccal proteins), it 
is predicted that the proteins from N.meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 8 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 59>: 

1 . . GCCGTCTTAA TCATCGAATT ATTGACGGGA ACGGTTTATC TTTTGGTTGT 

51 NAGCGCGGCT TTGGCGGGTT CGGGCATTGC TTACGGGCTG ACCGGCAGTA 

101 CGCCTGCCGC CGTCTTGACC GNCGCTCTGC TTTCCGCGCT GGGTATTTNG 

151 TTCGTACACG CCAAAACCGC CGTTAGAAAA GTTGAAACGG ATTCATATCA 

201 GGATTTGGAT GCCGGACAAT ATGTCGAAAT CCTCCGNCAC ACAGGCGGCA 

251 ACCGTTACGA AGTT . TTTAT CGCGGTACG. ACTGGCAGGC TCAAAATACG 

301 GGGCAAGAAG AGCTTGAACC AGGAACTCGC GCCCTCATTG TCCGCAAGGA 

351 AGGGAACCTT CTTATTATCA CACACCCTTA A 

This corresponds to the amino acid sequence <SEQ ID 60; ORF13>: 

1 , . AVLIIELLTG TVYLLWSAA LAG SG I AYGL TGSTPAAVLT XA LLSALGIX 
51 FVHAKTAVRK VETDSYQDLD AGQYVEILRH TGGNRYEVXY RGTXWQAQNT 
101 GQEELEPGTR ALIVRKEGNL LIITHP* 

Further sequence analysis elaborated the DNA sequence slightly <SEQ ID 61>: 

1 . . GCCGTCTTAA TCATCGAATT ATTGACGGGA ACGGTTTATC TTTTGGTTGT 

51 nAGCGCGGCT TTGGCGGGTT CGGGCATTGC TTACGGGCTG ACCGGCAGTA 

101 CGCCTGCCGC CGTCTTGACC GnCGCTCTGC TTTCCGCGCT GGGTATTTnG 

151 TTCGTACACG CCAAAACCGC CGTTAGAAAA GTTGAAACGG ATTCATATCA 

201 GGATTTGGAT GCCGGACAAT ATGTCGAAAT CCTCCGACAC ACAGGCGGCA 

251 ACCGTTACGA AGTTTTtTAT CGCGGTACGc ACTGGCAGGC TCAAAATACG 

301 GGGCAAGAAG AGCTTGAACC AGGAACTCGC GCCCTCATTG TCCGCAAGGA 

351 AGGCAACCTT CTTATTATCA CACACCCTTA A 

This corresponds to the amino acid sequence <SEQ ID 62; ORF13-l>: 

1 . . AVLIIELLTG TVYLLWSAA LAG SGI AYGL TGSTPAAVLT X ALLSALGIX 
51 FVHAKTAVRK VETDSYQDLD AGQYVEILRH TGGNRYEVFY RGTHWQAQNT 
101 GQEELEPGTR ALIVRKEGNL LIITHP* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meningitidis (strain A) 

ORF13 shows 92.9% identity over a 126aa overlap with an ORF (ORF13a) from strain A of N 
meningitidis: 

10 20 30 40 50 

or f 13. pep AVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTXA LLSALGIXF 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl3a MTVWFVAAVAVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTA ALLSALGIWF 

10 20 30 40 50 60 

60 70 80 90 100 110 

or f 13 . pep VHAKT AVRKVET DS YQDLDAGQYVE I LRHTGGNRYEVXYRGTXWQAQNTGQEELE PGTRA 
Mill II I Ml MM II I I 111:11 III: II II I II MM I I I I I II II II II I I I I 
orfl3a VHAKTAVGKVETDSYQDLDAGQYAEILRHAGGNRYEVFYRGTHWQAQNTGQEELE PGTRA 

70 80 90 100 110 120 

120 

orfl3.pep LIVRKEGNLLIITHPX 
I I I I I I I M I ! I : • I I 
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orfl3a LIVRKEGNLLI IAKPX 

130 

The complete length ORF13a nucleotide sequence <SEQ ID 63> is: 

1 ATGACTGTAT GGTTTGTTGC CGCTGTTGCC GTCTTAATCA TCGAATTATT 

51 GACGGGAACG GTTTATCTTT TGGTTGTCAG CGCGGCTTTG GCGGGTTCGG 

101 GCATTGCTTA CGGGCTGACC GGCAGCACGC CTGCCGCCGT CTTGACCGCC 

151 GCTCTGCTTT CCGCGCTGGG TATTTGGTTC GTACACGCCA AAACCGCCGT 

201 GGGAAAAGTT GAAACGGATT CATATCAGGA TTTGGATGCC GGGCAATATG 

251 CCGAAATCCT CCGGCACGCA GGCGGCAACC GTTACGAAGT TTTTTATCGC 

301 GGTACGCACT GGCAGGCTCA AAATACGGGG CAAGAAGAGC TTGAACCAGG 

351 AACGCGCGCC CTAATCGTCC GCAAGGAAGG CAACCTTCTT ATCATCGCAA 

401 AACCTTAA 

This encodes a protein having amino acid sequence <SEQ ED 64>: 



1 MTVWFVAAVA VLIIELLTGT VYLLWSAAL AGSGIAYGLT GSTPAAVLTA 
51 ALLSALGIWF VHAKTAVGKV ETDSYQDLDA GQYAEILRHA GGNRYEVFYR 
101 GTHWQAQNTG QEELEPGTRA LIVRKEGNLL IIAKP* 

ORF13a and ORF13-1 show 94.4% identity in 126 aa overlap 



10 20 30 40 50 60 

orf 13a . pep MTVWFVAAVAVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTAALLSALGIWF 

I I I I I I I I I I I I I I I I I 1 I I I I I I I ! I I t I I I I I I I I I I I I I I I I I II I 
orf 13-1 AVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTXALLSALGIXF 

10 20 30 40 50 



70 80 90 100 110 120 

orf 13a . pep VHAKTAVGKVETDSYQDLDAGQYAEILRHAGGNRYEVFYRGTHWQAQNTGQEELEPGTRA 
II Mill I I I I I I M II I II I I : I M I I : II 1 I I I I I I I I I I I I I I I I I I I I I I M I II 
o r f 1 3 - 1 VHAKT AVRKVET D S YQDLDAGQ YVE I LRHTGGNRYE VFYRGTHWQAQNTGQEE LE PGTRA 

60 70 80 90 100 110 



130 

orf 13a . pep LIVRKEGNLLI IAKPX 

I I I II I I I I I I I:: I I 
orf 13-1 LIVRKEGNLLI ITHPX 

120 



Homology with a predicted ORF from N.zonorrhoeae 

ORF13 shows 89.7% identity over a 126aa overlap with a predicted ORF (ORF13.ng) from N. 
gonorrhoeae: 

orf 13 AVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTXALLSALGIXF 51 

I I I I I I II I I I M I I I II I I I I I I I I I I M II I I I I I I I I mm: II I 
orfl3ng MTVWFVAAVAVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTAALLSALGIWF 60 

or f 1 3 VHAKTAVRKVETDSYQDLDAGQYVE I LRHTGGNRYE VXYRGTXWQAQNTGQEELE PGTRA 111 

I I 1 I I I I MII!IMlll:|:l:|lll:IMIiMI Mil II INI III :IMIII 
orfl3ng VHAKTAVGKVETDSYQDLDTGKYAEILRYTGGNRYEVFYRGTHWQAQNTGQEVFEPGTRA 120 

orfl3 LIVRKEGNLLI ITHP 126 

I I I I I I I I I I I I:: I 
orfl3ng LIVRKEGNLLI I ANP 135 

The complete length ORF13ng nucleotide sequence <SEQ ID 65> is: 

1 ATGACTGTAT GGTTTGTTGC CGCTGTTGCC GTCTTAATCA TCGAATTATT 

51 GACGGGAACG GTTTATCTTT TGGTTGTCAG CGCGGCTTTG GCGGGTTCGG 

101 GCATTGCCTA CGGGCTGACT GGCAGCACGC CTGCCGCCGT CTTGACCGCC 

151 GCACTGCTTT CCGCGCTGGG CATTTGGTTC GTACATGCCA AAACCGCCGT 

201 GGGAAAAGTT GAAACGGATT CATATCAGGA TTTGGATACC GGAAAATATG 

251 CCGAAATCCT CCGATACACA GGCGGCAACC GTTACGAAGT TTTTTATCGC 

301 GGTACGCACT GGCAGGCGCA AAATACGGGG CAGGAAGTGT TTGAACCGGG 

351 AACGCGCGCC CTCATCGTCC GCAAAGAAGG TAACCTTCTT ATCATCGCAA 

401 ACCCTTAA 
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This encodes a protein having amino acid sequence <SEQ ID 66>: 

1 MTVWFVAAVA VLIIELLTGT VYLLWSAAL AG SG I AYGLT GSTPAAVLTA 
51 ALLSALGIWF VHAKTAVGKV ETDSYQDLDT GKYAEILRYT GGNRYEVFYR 
101 GTHWQAQNTG QEVFEPGTRA LIVRKEGNLL IIANP* 

ORF13ng shows 91.3% identity in 126 aa overlap with ORF13-1: 

10 20 30 40 50 

orf 13-1. pep AVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTXALLSALGIXF 

I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I 
orfl3ng MTVWFVAAVAVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTAALLSALGIWF 

10 20 30 40 50 60 

60 70 80 90 100 110 

orf 13-1 . pep VHAKTAVRKVETDSYQDLDAGQYVEILRHTGGNRYEVFYRGTHWQAQNTGQEELEPGTRA 

I I II I II I I I I I I I I I I I : I : I : I I I I : I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I 
orfl3ng VHAKTAVGKVETDSYQDLDTGKYAEILRYTGGNRYEVFYRGTHWQAQNTGQEVFEPGTRA 

70 80 90 100 110 120 

120 

orf 13-1. pep LIVRKEGNLLI ITHPX 

II I I II I I I I II : : II 
orfl3ng LIVRKEGNLLI I AN PX 

130 

Based on this analysis, including the extensive leader sequence in this protein, it is predicted that 
ORF13 and ORF13ng are likely to be outer membrane proteins. It is thus predicted that the proteins 
from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines 
or diagnostics, or for raising antibodies. 

Example 9 

The following DNA sequence was identified in N. meningitidis <SEQ ID 67>: 

1 ATGTwTGATT TCGGTTTrGG CGArCTGGTT TTTGTCGGCA TTATCGCCCT 

51 GATwGtCCTC GGCCCCGAAC GCsTGCCCGA GGCCGCCCGC AyCGCCGGAC 

101 GGcTCATCGG CAGGCTGCAA CGCTTTGTCG GcAGCGTCAA ACAGGAATTT 

151 GACACTCAAA TCGAACTGGA AGAACTGAGG AAGGCAAAGC AGGAATTTGA 

201 AGCTGCCGcC GCTCAGGTTC GAGACAGCCT CAAAGAAACC GGTACGGATA 

251 TGGAAGGCAA TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

301 CTGCCCGAAC AGCGGACACC TGCCGATTTC GGTGTCGATG AAAACGGCAA 

351 TCCGCT.TCC CGATGCGGCA AACACCCTAT CAGACGGCAT TTCCGACGTT 

401 ATGCCGTC . . 

This corresponds to the amino acid sequence <SEQ ID 68; ORF2>: 

1 MXDFGLGELV FVGIIALIVL GPERXPEAAR XAGRLIGRLQ RFVGSVKQEF 
51 DTQIELEELR KAKQEFEAAA AQVRDSLKET GTDMEGNLHD ISDGLKPWEK 
101 LPEQRTPADF GVDENGNPXS RCGKHPIRRH FRRYAV. . 

Further work revealed the complete nucleotide sequence <SEQ ID 69>: 

1 ATGTTTGATT TCGGTTTGGG CGAGCTGGTT TTTGTCGGCA TTATCGCCCT 

51 GATTGTCCTC GGCCCCGAAC GCCTGCCCGA GGCCGCCCGC ACCGCCGGAC 

101 GGCTCATCGG CAGGCTGCAA CGCTTTGTCG GCAGCGTCAA ACAGGAATTT 

151 GACACTCAAA TCGAACTGGA AGAACTGAGG AAGGCAAAGC AGGAATTTGA 

201 AGCTGCCGCC GCTCAGGTTC GAGACAGCCT CAAAGAAACC GGTACGGATA 

251 TGGAAGGCAA TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

301 CTGCCCGAAC AGCGGACACC TGCCGATTTC GGTGTCGATG AAAACGGCAA 

351 TCCGCTTCCC GATGCGGCAA ACACCCTATC AGACGGCATT TCCGACGTTA 

401 TGCCGTCCGA ACGTTCCTAC GCTTCCGCCG AAACCCTTGG GGACAGCGGG 

451 CAAACCGGCA GTACAGCCGA ACCCGCGGAA ACCGACCAAG ACCGCGCATG 

501 GCGGGAATAC CTGACTGCTT CTGCCGCCGC ACCCGTCGTA CAGACCGTCG 
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551 AAGTCAGCTA TATCGATACT GCTGTTGAAA CGCCTGTTCC GCACACCACT 
601 TCCCTGCGCA AACAGGCAAT AAGCCGCAAA CGCGATTTTC GTCCGAAACA 
651 CCGCGCCAAA CCTAAATTGC GCGTCCGTAA ATCATAA 

This corresponds to the amino acid sequence <SEQ ED 70; ORF2-l>: 

1 MFD FGLGELV FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEF 

51 DTQIELEELR KAKQEFEAAA AQVRDSLKET GTDMEGNLHD ISDGLKPWEK 

101 LPEQRTPADF GVDENGNPLP DAANTLSDGI SDVMPSERSY ASAETLGDSG 

151 QTGSTAEPAE TDQDRAWREY LTASAAAPW QTVEVSYIDT AVETPVPHTT 

201 SLRKQAISRK RDFRPKHRAK PKLRVRKS * 

Further work identified the corresponding gene in strain A of N. meningitidis <SEQ ED 71 >: 

1 ATGTTTGATT TCGGTTTGGG CGAGCTGGTT TTTGTCGGCA TTATCGCCCT 

51 GATTGTCCTC GGCCCCGAAC GCCTGCCCGA GGCCGCCCGC ACCGCCGGAC 

101 GGCTCATCGG CAGGCTGCAA CGCTTTGTCG GCAGCGTCAA ACAGGAATTT 

151 GACACGCAAA TCGAACTGGA AGAACTAAGG AAGGCAAAGC AGGAATTTGA 

201 AGCTGCCGCT GCTCAGGTTC GAGACAGCCT CAAAGAAACC GGTACGGATA 

251 TGGAGGGTAA TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

301 CTGCCCGAAC AGCGCACGCC TGCTGATTTC GGTGTCGATG AAAACGGCAA 

351 TCCCTTTCCC GATGCGGCAA ACACCCTATT AGACGGCATT TCCGACGTTA 

401 TGCCGTCCGA ACGTTCCTAC GCTTCCGCCG AAACCCTTGG GGACAGCGGG 

451 CAAACCGGCA GTACAGCCGA ACCCGCGGAA ACCGACCAAG ACCGTGCATG 

501 GCGGGAATAC CTGACTGCTT CTGCCGCCGC ACCCGTCGTA CAGACCGTCG 

551 AAGTCAGCTA TATCGATACC GCTGTTGAAA CCCCTGTTCC GCATACCACT 

601 TCGCTGCGTA AACAGGCAAT AAGCCGCAAA CGCGATTTGC GTCCTAAATC 

651 CCGCGCCAAA CCTAAATTGC GCGTCCGTAA ATCATAA 

This encodes a protein having amino acid sequence <SEQ ID 72; ORF2a>: 

1 MFD FGLGELV FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEF 

51 DTQIELEELR KAKQEFEAAA AQVRDSLKET GTDMEGNLHD ISDGLKPWEK 

101 LPEQRTPADF GVDENGNPFP DAANTLLDGI SDVMPSERSY ASAETLGDSG 

151 QTGSTAEPAE TDQDRAWREY LTASAAAPW QTVEVSYIDT AVETPVPHTT 

201 SLRKQAISRK RDLRPKSRAK PKLRVRKS* 

The originally-identified partial strain B sequence (ORF2) shows 97.5% identity over a 118aa 
overlap with ORF2a: 

10 20 30 40 . 50 60 

orf 2 . pep MXD FGLGELVFVGIIALIVL GPERXPEAARXAGRLIGRLQRFVGSVKQEFDTQIELEELR 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I II I I I I I I I I I I i I I I 
orf 2a MFD FGLGELVFVGIIALIVL GPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 2. pep KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPXS 

I I I I 1 I I I i I I I I I I I I I I I I I M I I I I I II I i I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 2a KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPFP 

70 80 90 100 110 120 

130 

or f 2 . pep RCGKH P I RRH FRRYAV 

orf 2a DAANTLLDGISDVMPSERSYASAETLGDSGQTGSTAEPAETDQDRAWREYLTASAAAPW 

130 140 150 160 170 180 

The complete strain B sequence (ORF2-1) and ORF2a show 98.2% identity in 228 aa overlap: 

orf 2a . pep MFDFGLGELVFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 2-1 MFDFGLGELVFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 60 

orf 2a . pep KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPFP 120 

I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I: I 
orf 2-1 KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPLP 120 

orf 2a. pep DAANTLLDGISDVMPSERSYASAETLGDSGQTGSTAEPAETDQDRAWREYLTASAAAPW 180 

I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
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orf2-l DAANTLSDGISDVMPSERSYASAETLGDSGQTGSTAEPAETDQDRAWREYLTASAAAPW 180 

orf 2a .pep QTVEVSYIDTAVETPVPHTTSLRKQAISRKRDLRPKSRAKPKLRVRKSX 229 

I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I : I I I I i I I I I I I I 1 I I 
5 orf 2-1 QTVEVSYIDTAVETPVPHTTSLRKQAISRKRDFRPKHRAKPKLRVRKSX 229 

Further work identified a partial DNA sequence <SEQ ID 73> in N. gonorrhoeae encoding the 
following amino acid sequence <SEQ ID 74; ORF2ng>: 

1 MF DFGLGELI FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEL 
10 51 DTQIELEELR KVKQAFEAAA AQVRDSLKET DTDMQNSLHD ISDGLKPWEK 

101 LPEQRTPADF GVDEKGNSLS RYGKHRIRRH FRRYAV* 

Further work identified the complete gonococcal gene sequence <SEQ ID 75>: 

1 ATGTTTGATT TCGGTTTGGG CGAGCTGATT TTTGTCGGCA TTATCGCCCT 

51 GATTGTCCTT GGTCCAGAAC GCCTGCCCGA AGCCGCCCGC ACTGCCGGAC 

15 101 GGCTTATCGG CAGGCTGCAA CGCTTTGTAG GAAGCGTCAA ACAAGAACTT 

151 GACACTCAAA TCGAACTGGA AGAGCTGAGG AAGGTCAAGC AGGCATTCGA 

201 AGCTGCCGCC GCTCAGGTTC GAGACAGCCT CAAAGAAACC GATACGGATA 

251 TGCAGAACAG TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

301 CTGCCCGAAC AGCGCACGCc tgccgatttc gGTGTCGATg AAAacggcaa 

20 351 tccccttccc gATACGGCAA ACACCGTATC AGACGGCATT TCCGACGTTA 

401 TGCCGTCTGA ACGTTCCGAT ACTtccgcCG AAACCCTTGG GGACGACAGG 

451 CAAACCGGCA GTACAGCCGA ACCTGCGGAA ACCGACAAAG ACCGCGCATG 

501 GCGGGAATAC CTGactgctt ctgccgccgc acctgtcgta Cagagggccg 

551 tcgaagtcag ctaTATCGAT ACTGCTGTTG AAacgcctgT tccgcaCacc 

25 601 acttccctgc gcaAACAGGC AATAAACCGC AAACGCGATT TttgtccgaA 

651 ACACCGCGCc aAACCGAAat tgcgcgtcCG TAAATCATAA 

This encodes a protein having the amino acid sequence <SEQ ID 76; ORF2ng-l>: 

1 MFD FGLGELI FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEL 

51 DTQIELEELR KVKQAFEAAA AQVRDSLKET DTDMQNSLHD ISDGLKPWEK 

30 101 LPEQRTPADF GVDENGNPLP DTANTVSDGI SDVMPSERSD TSAETLGDDR 

151 QTGSTAEPAE TDKDRAWREY LTASAAAPW QRAVEVSYID TAVETPVPHT 

201 TSLRKQAINR KRDFCPKHRA KPKLRVRKS* 

The originally-identified partial strain B sequence (ORF2) shows 87.5% identity over a 136aa 
overlap with ORF2ng: 

35 orf 2 . pep MXDFGLGELVFVGIIALIVLGPERXPEAARXAGRLIGRLQRFVGSVKQEFDTQIELEELR 60 

I | | | | | I I : I I I I I I I I I I I I I I I I I II : I I I I I I I I I I I I I I I I I I : I I I I I I I II I 
orf2ng MFDFGLGELIFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQELDTQIELEELR 60 

orf 2 . pep KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPXS 120 

40 I : I i I I I I I I I I I I I I I I I I I I ::: I I I I I I I I I I I I I I I I I t I I i I M I i I : I I 

orf2ng KVKQAFEAAAAQVRDSLKETDTDMQNSLHDISDGLKPWEKLPEQRTPADFGVDEKGNSLP 120 

orf 2. pep RCGKHP I RRH FRRYAV 136 

j III II I I I I I I I I 
45 orf2ng RYGKHRIRRH FRRYAV 136 

The complete strain B and gonococcal sequences (ORF2-1 & ORF2ng-l) show 91.7% identity in 
229 aa overlap: 

10 20 30 40 50 60 

orf 2-1 . pep MFDFGLGELVFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 
50 | | | | I | I I I : I I I I I I M I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I 1 I I I I 

orf2ng-l MFDFGLGELI FVG 1 1 ALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQELDTQIELEELR 

10 20 30 40 50 60 

70 80 90 100 110 120 

55 orf 2-1 .pep KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPLP 

|:|| || | I I I I I I I I II I I I I I ::: I I I I I I I I I I I I I I I I I I I | | I I I I M I I I I I I 
orf2ng-l KVKQAFEAAAAQVRDSLKET DTDMQNSLHD I SDGLKPWEKLPEQRTPADFGVDENGNPLP 
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70 80 90 100 110 120 

130 140 150 160 170 180 

orf 2-1 . pep DAANTLSDGISDVMPSERSYASAETLGDSGQTGSTAEPAETDQDRAWREYLTASAAAPW 
5 I : I ! I : I I I I I I I M II I I : I I I I I M : I I I I I I M I I II : I I I I I I I I I I I I I I I I I 

orf2ng-l DTANTVSDGISDVMPSERSDTSAETLGDDRQTGSTAEPAETDKDRAWREYLTASAAAPW 

130 140 150 160 170 180 

190 200 210 220 229 

10 orf 2-1 .pep Q-TVEVSYIDTAVETPVPHTTSLRKQAISRKRDFRPKHRAKPKLRVRKSX 

I : I Ml I III I Ml I Mill I Ml I II:! Ml I I I I I I I I I I I I I I I I 
orf2ng-l QRAVEVSYIDTAVETPVPHTTSLRKQAINRKRDFCPKHRAKPKLRVRKSX 

190 200 210 220 230 

Computer analysis of these amino acid sequences indicates a transmembrane region (underlined), 
1 5 and also revealed homology (59% identity) between the gonococcal sequence and the TatB protein 
of E.colt 



20 



gnl|PID|el292181 (AJ005830) TatB protein [Escherichia coli] Length = 171 
Score =56.6 bits (134), Expect - le-07 

Identities = 30/88 (34%), Positives = 52/88 (59%), Gaps = 1/88 (1%) 

Query: 1 MFDFGLGELI EVGI IALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQELDTQIELEELR 60 

MFD G EL+ V II L+VLGP+RLP A +T I L+ +V+ EL +++L+E + 
Sbjct: 1 MFDIGFSELLLVFIIGLWLGPQRLPVAVKTVAGWIRALRSLATTVQNELTQELKLQEFQ 60 



25 Query: 61 -KVKQAFEAAAAQVRDSLKETDTDMQNS 87 

+K+ +A+ + LK + +++ + 
Sbjct: 61 DSLKKVEKASLTNLTPELKASMDELRQA 88 

Based on this analysis, it was predicted that ORF2, ORF2a and ORF2ng are likely to be membrane 
proteins and so the proteins from meningitidis and N. gonorrhoeae, and their epitopes, could be 
30 useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF2-1 (16kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGR Figure 3A 
shows the results of affinity purification of the GST-fusion protein, and Figure 3B shows the results 
of expression of the His-fusion in E.colL Purified GST-fusion protein was used to immunise mice, 
35 whose sera were used for Western blots (Figure 3C), ELISA (positive result), and FACS analysis 
(Figure 3D). These experiments confirm that ORF37-1 is a surface-exposed protein, and that it is 
a useful immunogen. 

Example 10 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 77>: 

40 1 ATGCAAGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGC . TGCGGG ACACTGACAG GTATTCCATC GCATGGCGgA GkTAAACgCT 

101 TTgCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

201 CACTATGGGC GACCAAGGTT CAGGcAGTTT GACAGGGGGG TCGCTACTCC 

45 251 ATTGATGCAC kGrTwCsTGG CGAATACATA AACAGCCCTG CCGTCCGTAC 

301 CGATTACACC TATCCACGTT ACGAAACCAC CGCTGAAACA ACATCAGGCG 

351 GTTTGACAGG TTTAACCACT TCTTTATCTA CACTTAATGC CCCTGCACTC 

401 TCTCGCACCC AATCAGACGG TAGCGGAAGT AAAAGCAGTC TGGGCTTAAA 

451 TATTGGCGGG ATGGGGGATT ATCGAAATGA AACCTTGACG ACTAACCCGC 
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501 GCGACACTGC CTTTCTTTCC CACTTGGTAC AGACCGTATT TTTCCTGCGC 

551 GGCATAGACG TTGTTTCTCC TGCCAATGCC GATACAGATG TGTTTATTAA 

601 CATCGACGTA TTCGGAACGA TACGCAACAG AACCGAAATG. . 

This corresponds to the amino acid sequence <SEQ ID 78; ORF15>: 

1 MQARLLIPIL FSVFILSACG TLTGIPSHGG XKRFAVEQEL VAASARAAVK 

51 DMDLQALHGR KVALYIATMG DQGSGSLTGG RYSIDAXXXG EYINSPAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSKSSLGLN 

151 IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF FLRGIDWSP ANADTDVFIN 

201. IDVFGTIRNR TEM. . 

Further work revealed the complete nucleotide sequence <SEQ ID 79>: 

1 ATGCAAGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGCCTGCGGG ACACTGACAG GTATTCCATC GCATGGCGGA GGTAAACGCT 

101 TTGCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

201 CACTATGGGC GACCAAGGTT CAGGCAGTTT GACAGGGGGT CGCTACTCCA 

251 TTGATGCACT GATTCGTGGC GAATACATAA ACAGCCCTGC CGTCCGTACC 

301 GATTACACCT ATCCACGTTA CGAAACCACC GCTGAAACAA CATCAGGCGG 

351 TTTGACAGGT TTAACCACTT CTTTATCTAC ACTTAATGCC CCTGCACTCT 

401 CTCGCACCCA ATCAGACGGT AGCGGAAGTA AAAGCAGTCT GGGCTTAAAT 

451 ATTGGCGGGA TGGGGGATTA TCGAAATGAA ACCTTGACGA CTAACCCGCG 

501 CGACACTGCC TTTCTTTCCC ACTTGGTACA GACCGTATTT TTCCTGCGCG 

551 GCATAGACGT TGTTTCTCCT GCCAATGCCG ATACAGATGT GTTTATTAAC 

601 ATCGACGTAT TCGGAACGAT ACGCAACAGA ACCGAAATGC ACCTATACAA 

651 TGCCGAAACA CTGAAAGCCC AAACAAAACT GGAATATTTC GCAGTAGACA 

701 GAACCAATAA AAAATTGCTC ATCAAACCAA AAACCAATGC GTTTGAAGCT 

751 GCCTATAAAG AAAATTACGC ATTGTGGATG GGGCCGTATA AAGTAAGCAA 

801 AGGAATTAAA CCGACGGAAG GATTAATGGT CGATTTCTCC GATATCCGAC 

851 CATACGGCAA TCATACGGGT AACTCCGCCC CATCCGTAGA GGCTGATAAC 

901 AGTCATGAGG GGTATGGATA CAGCGATGAA GTAGTGCGAC AACATAGACA 

951 AGGACAACCT TGA 

This corresponds to the amino acid sequence <SEQ ID 80; ORF15-l>: 

1 MQARLLIPIL FSVFILSA CG TLTGIPSHGG GKRFAVEQEL VAASARAAVK 

51 DMDLQALHGR KVALYIATMG DQGSGSLTGG RYSIDALIRG EYINSPAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSKSSLGLN 

151 IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF FLRGIDWSP ANADTDVFIN 

201 IDVFGTIRNR TEMHLYNAET LKAQTKLEYF AVDRTNKKLL IKPKTNAFEA 

251 AYKENYALWM GPYKVSKGIK PTEGLMVDFS DIRPYGNHTG NSAPSVEADN 

301 SHEGYGYSDE WRQHRQGQP * 

Further work identified the corresponding gene in strain A of N. meningitidis <SEQ ID 81>: 

1 ATGCAAGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

x 51 CGCCTGCGGG ACACTGACAG GTATTCCATC GCATGGCGGA GGTAAACGCT 

101 TTGCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

201 AACTATGGGC GACCAAGGTT CAGGCAGTTT GACAGGGGGT CGCTACTCCA 

251 TTGATGCACT GATTCGTGGC GAATACATAA ACAGCCCTGC CGTCCGTACC 

301 GATTACACCT ATCCACGTTA CGAAACCACC GCTGAAACAA CATCAGGCGG 

351 TTTGACAGGT TTAACCACTT CTTTATCTAC ACTTAATGCC CCTGCACTCT 

401 CGCGCACCCA ATCAGACGGT AGCGGAAGTA AAAGCAGTCT GGGCTTAAAT 

451 ATTGGCGGGA TGGGGGATTA TCGAAATGAA ACCTTGACGA CTAACCCGCG 

501 CGACACTGCC TTTCTTTCCC ACTTGGTACA GACCGTATTT TTCCTGCGCG 

551 GCATAGACGT TGTTTCTCCT GCCAATGCCG ATACGGATGT GTTTATTAAC 

601 ATCGACGTAT TCGGAACGAT ACGCAACAGA ACCGAAATGC ACCTATACAA 

651 TGCCGAAACA CTGAAAGCCC AAACAAAACT GGAATATTTC GCAGTAGACA 

701 GAACCAATAA AAAATTGCTC ATCAAACCAA AAACCAATGC GTTTGAAGCT 

751 GCCTATAAAG AAAATTACGC ATTGTGGATG GGACCGTATA AAGTAAGCAA 

801 AGGAATTAAA CCGACAGAAG GATTAATGGT CGATTTCTCC GATATCCAAC 

851 CATACGGCAA TCATATGGGT AACTCTGCCC CATCCGTAGA GGCTGATAAC 

901 AGTCATGAGG GGTATGGATA CAGCGATGAA GCAGTGCGAC GACATAGACA 

951 AGGGCAACCT TGA 

This encodes a protein having amino acid sequence <SEQ ID 82; ORF15a>: 

1 MQARLLIPIL FSVFILSACG TLTGIPSHGG GKRFAVEQEL VAASARAAVK 
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51 
101 
151 
201 
251 
301 



DMDLQALHGR 
DYTYPRYETT 
IGGMGDYRNE 
IDVFGTIRNR 
AYKENYALWM 
SHEGYGYSDE 



KVALYIATMG 
AETTSGGLTG 
TLTTNPRDTA 
TEMHLYNAET 
GPYKVSKGIK 
AVRRHRQGQP 



DQGSGSLTGG 
LTTSLSTLNA 
FLSHLVQTVF 
LKAQTKLEYF 
PTEGLMVDFS 



RYSIDALIRG 
PALSRTQSDG 
FLRGIDWSP 
AVDRTNKKLL 
DIQPYGNHMG 



EYINSPAVRT 
SGSKSSLGLN 
ANADTDVFIN 
IKPKTNAFEA 
NSAPSVEADN 



10 



15 



20 



25 



30 



The originally-identified partial strain B sequence (ORF15) shows 98.1% identity over a 213aa 
overlap with ORF15a: 

10 20 30 40 50 60 

orf 15 . pep MQARLLIPILFSVFILSA CGTLTGIPSHGGXKRFAVEQELVAASARAAVKDMDLQALHGR 
I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 15a MQARLLI PILFSVFILSA CGTLTGI PSHGGGKRFAVEQELVAAS ARAAVKDMDLQALHGR 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 15 . pep KVALYIATMGDQGSGSLTGGRYSIDAXXXGEYINSPAVRTDYTYPRYETTAETTSGGLTG 
I I I I I I I I I I I I I 1 I I I I I I I I I I I I I M I I I I If I I I I I M I M I I I I I I I I I II I 
orf 15a KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 15 . pep LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 
I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 15a LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 

130 140 150 160 170 180 

190 200 210 

orf 15 . pep FLRGIDVVSPANADTDVFINIDVFGTIRNRTEM 
I I I I II I I i I I I I I I I I I I I I I I I I I I I I I I I I 
orf 15a FLRGIDWSPANADTDVFINIDVFGTIRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 

190 200 210 220 230 240 

The complete strain B sequence (ORF15-1) and ORF15a show 98.8% identity in 320 aa overlap: 
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40 



45 



50 



55 



60 



65 



orf 15a .pep 
orfl5-l 

orf 15a. pep 
orfl5-l 

orf 15a. pep 
orfl5-l 

orfl5a.pep 
orfl5-l 

orf 15a. pep 
orfl5-l 

orf 15a. pep 
orfl5-l 



10 20 30 40 50 60 

MQARLLI PILFSVFILSACGTLTG I PSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 
I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I II I I I I I 
MQARLLI PILFSVFILSACGTLTGI PSHGGGKRFAVEQELVAAS ARAAVKDMDLQALHGR 

10 20 30 40 50 60 

70 80 90 100 110 120 

KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 
I I I II I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I 
KVALYIATMGDQGSGSLTGGRYS I DALI RGEY INS PAVRTDYTYPRYETT AETTSGGLTG 

70 80 90 100 110 120 

130 140 150 160 170 180 

LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 
I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I 
LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 

130 140 150 160 170 180 

190 200 210 220 230 240 

FLRGIDWSPANADTDVFINIDVFGTIRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 
I I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 
FLRGIDWSPANADTDVFINIDVFGTIRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 

190 200 210 220 230 240 

250 260 270 280 290 300 

IKPKTNAFEAAYKENYALWMGPYKVSKGIKPTEGLMVDFSDIQPYGNHMGNSAPSVEADN 
I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I i I I I I I I : I 1 I I I I I II I I I I I I I 
IKPKTNAFEAAYKENYALWMGPYKVSKGIKPTEGLMVDFSDIRPYGNHTGNSAPSVEADN 

250 260 270 280 290 300 

310 320 
SHEGYGYSDEAVRRHRQGQPX 
I I I M II I M: II: I I I I ! M 
SHEGYGYSDEWRQHRQGQPX 



WO 99/24578 



-100- 



PCT/IB98/01665 



310 320 

Further work identified the corresponding gene in N.gonorrhoeae <SEQ ID 83>: 

1 ATGCGGGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGCCTGCGGG ACACTGACAG GTATTCCATC GCATGGCGGA GGCAAACGCT 

101 TCGCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

201 AACTATGGGC GACCAAGGTT CAGGCAGTTT GACAGGGGGT CGCTACTCCA 

251 TTGATGCACT GATTCGCGGC GAATACATAA ACAGCCCTGC CGTCCGCACC 

301 GATTACACCT ATCCGCGTTA CGAAACCACC GCTGAAACAA CATCAGGCGG 

351 TTTGACGGGT TTAACCACTT CTTTATCTAC ACTTAATGCC CCTGCACTCT 

401 CGCGCACCCA ATCAGACGGT AGCGGAAGTA GGAGCAGTCT GGGCTTAAAT 

451 ATTGGCGGGA TGGGGGATTA TCGAAATGAA ACCTTGACGA CCAACCCGCG 

501 CGACACTGCC TTTCTTTCCC ACTTGGTGCA GACCGTATTT TTCCTGCGCG 

551 GCATAGACGT TGTTTCTCCT GCCAATGCCG ATACAGATGT GTTTATTAAC 

601 ATCGACGTAT TCGGAACGAT ACGCAACAGA ACCGAAATGC ACCTATACAA 

651 TGCCGAAACA CTGAAAGCCC AAACAAAACT GGAATATTTC GCAGTAGACA 

701 GAACCAATAA AAAATTGCTC ATCAAACCCA AAACCAATGC GTTTGAAGCT 

751 GCCTATAAAG AAAATTACGC ATTGTGGATG GGGCCGTATA AAGTAAGCAA 

801 AGGAATCAAA CCGACGGAAG GATTGATGGT CGATTTCTCC GATATCCAAC 

851 CATACGGCAA TCATACGGGT AACTCCGCCC CATCCGTAGA GGCTGATAAC 

901 AGTCATGAGG GGTATGGATA CAGCGATGAA GCAGTGCGAC AACATAGACA 

951 AGGGCAACCT TGA 

This encodes a protein having amino acid sequence <SEQ ID 84; ORF15ng>: 



1 MRARLLIPIL FSVF ILSAC G TLTGIPSHGG GKRFAVEQEL VAASARAAVK 

51 DMDLQALHGR KVALYIATMG DQGSGSLTGG RYSIDALIRG EYINSPAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSRSSLGLN 

151 IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF FLRGIDWSP ANADTDVFIN 

201 IDVFGTIRNR TEMHLYNAET LKAQTKLEYF AVDRTNKKLL IKPKTNAFEA 

251 AYKENYALWM GPYKVSKGIK PTEGLMVDFS DIQPYGNHTG NSAPSVEADN 

301 SHEGYGYSDE AVRQHRQGQP * 

The originally-identified partial strain B sequence (ORF15) shows 97.2% identity over a 213aa 



overlap with ORFlSng: 

or f 15 . pep MQARLLIPILFSVFILSACGTLTGIPSHGGXKRFAVEQELVAASARAAVKDMDLQALHGR 60 

I : I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 1 I 
orfl5ng MRARLLIPILFSVFILSACGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 60 



orfl5.pep KVALYIATMGDQGSGSLTGGRYSIDAXXXGEYINSPAVRTDYTYPRYETTAETTSGGLTG 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orfl5ng KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 120 

orfl5.pep LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 180 

I I I I I I I I ! I II I I I I I I I I I I I : I I I I II I I I II I I I I I I I I I II I I I I I I I I I I II I I 

orflSng LTTSLSTLNAPALSRTQSDGSGSRSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 180 

orflS.pep FLRG I DWS PAN ADT DVFINI DVFGT I RNRTEM 213 

I I I I II I I I I I I I I I I II I I I I II I I I I I I I I I 

orflSng FLRGIDWSPANADTDVFINIDVFGTIRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 240 

The complete strain B sequence (ORF15-1) and ORFlSng show 98.8% identity in 320 aa overlap: 



10 20 30 40 50 60 

orf 15-1 . pep MQARLLIPILFSVFILSACGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 
I : | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
orflSng MRARLLIPILFSVFILSACGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 15-1 . pep KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 
I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I i I I I I I I II 
orfl5ng KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 

70 80 90 100 110 120 



orfl5-l.pep 



130 140 150 160 170 180 

LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 
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10 



I I I I I M Ml M M II I I ill M:l Mill I MM I! I M I! Ill Ml I I III M III I I 
orf!5ng LTTSLSTLNAPALSRTQSDGSGSRSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 15-1 . pep FLRG I DWS PAN ADT DVFIN I DVFGT I RNRTEMHLYNAETLKAQTKLE YFAVDRTNKKLL 
M M M M I M I M M M M I M I M ! M M I M I M M M M M M M M I M M I M I 
orfl5ng FLRG I DWSPANADT DVFIN I DVFGT I RNRTEMHLYNAETLKAQTKLE YFAVDRTNKKLL 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 15-1 . pep IKPKTNAFEAAYKENYALWMGPYKVSKGIKPTEGLMVDFSDIRPYGNHTGNSAPSVEADN 
M M M M M I M M M M I M M M M M M M M M M M : M M I I I M M M M M 
orfl5ng IKPKTNAFEAAYKENYALWMGPYKVSKGIKPTEGLMVDFSDIQPYGNHTGNSAPSVEADN 
15 250 260 270 280 290 300 

310 320 
orf 15-1 .pep SHEGYGYSDEWRQHRQGQPX 

lllllllllhllllllllll 

20 orflSng S HEG YG YS DEAVRQHRQGQ PX 

310 320 

Computer analysis of these amino acid sequences reveals an ILSAC motif (putative membrane 
lipoprotein lipid attachment site, as predicted by the MOTIFS program). 

indicates a putative leader sequence, and it was predicted that the proteins from N. meningitidis and 
25 N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

ORF 15-1 (31.7kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
4A shows the results of affinity purification of the GST-fusion protein, and Figure 4B shows the 
30 results of expression of the His-fusion in E.colL Purified GST-fusion protein was used to immunise 
mice, whose sera were used for Western blot (Figure 4C) and ELISA (positive result). These 
experiments confirm that ORFX-1 is a surface-exposed protein, and that it is a useful immunogen. 

Example 11 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 85>; 

35 1 ..GG.CAGCACA AAAAACAGGC GGTTGAACGG AAAAACCGTA TTTACGATGA 

51 TGCCGGGTAT GATATTCGGC GTATTCACGG GCGCATTCTC CGCAAAATAT 

101 ATCCCCGCGT TCGGGCTTCA AATTTTCTTC ATCCTGTTTT TAACCGCCGT 

151 CGCATTCAAA ACACTGCATA CCGACCCTCA GACGGCATCC CGCCCGCTGC 

201 CCGGACTGCC CrGACTGACT GCGGTTTCCA CACTGTTCGG CACAATGTCG 

40 251 AGCTGGGTCG GCATAGGCGG CGGTTCACTT TCCGTCCCCT TCTTAATCCA 

301 CTGCGGCTTC CCCGCCCATA AAGCCATCGG CACATCATCC GGCCTTGCCT 

351 GGCCGATTGC ACTCTCCGGC GCAATATCGT ATCTGCTCAA CGGCCTGAAT 

401 ATTGCAGGAT TGCCCGAAGG GTCACTGGGC TTCCTTTACC TGCCCGCCGT 

4 51 CGCCGTCCTC AGCGCGGCAA CCATTGCCTT TGCCCCGCTC GGTGTCAAAA 

45 501 CCGCCCACAA ACTTTCTTCT GCCAAACTCA AAAAATC.TT CGGCATTATG 

551 TTGCTTTTGA TTGCCGGAAA AATGCTGTAC AACCTGCTTT AA 



This corresponds to the amino acid sequence <SEQ ID 86; ORF17>: 



1 ..GQHKKQAVNG KTVFTMMPGM IFGVFTGAFS AKYIPAFGLQ IFFILFLTAV 
51 AFKTLHTDPQ TASRPLPGLP XLTAVSTLFG TMSSWVGIGG GSLSVPFLIH 
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101 CGFPAHKAIG TSSGLAWPIA LSGAISYLLN GLNIAGLPEG SLGFLYLPAV 
151 AVLSAATIAF APLGVKTAHK LSSAKLKKSF GIMLLLIAGK MLYNLL* 

Further work revealed the complete nucleotide sequence <SEQ ID 87>: 

1 ATGTGGCATT GGGACATTAT CTTAATCCTG CTTGCCGTAG GCAGTGCGGC 

51 AGGTTTTATT GCCGGCCTGT TCGGCGTAGG CGGCGGCACG CTGATTGTCC 

101 CTGTCGTTTT ATGGGTGCTT GATTTGCAGG GTTTGGCACA ACATCCTTAC 

151 GCGCAACACC TCGCCGTCGG CACATCCTTC GCCGTCATGG TCTTCACCGC 

201 CTTTTCCAGT ATGCTGGGGC AGCACAAAAA ACAGGCGGTC GACTGGAAAA 

251 CCGTATTTAC GATGATGCCG GGTATGATAT TCGGCGTATT CACGGGCGCA 

301 CTCTCCGCAA AATATATCCC CGCGTTCGGG CTTCAAATTT TCTTCATCCT 

351 GTTTTTAACC GCCGTCGCAT TCAAAACACT GCATACCGAC CCTCAGACGG 

401 CATCCCGCCC GCTGCCCGGA CTGCCCGGAC TGACTGCGGT TTCCACACTG 

451 TTCGGCACAA TGTCGAGCTG GGTCGGCATA GGCGGCGGTT CACTTTCCGT 

501 CCCCTTCTTA ATCCACTGCG GCTTCCCCGC CCATAAAGCC ATCGGCACAT 

551 CATCCGGCCT TGCCTGGCCG ATTGCACTCT CCGGCGCAAT ATCGTATCTG 

601 CTCAACGGCC TGAATATTGC AGGATTGCCC GAAGGGTCAC TGGGCTTCCT 

651 TTACCTGCCC GCCGTCGCCG TCCTCAGCGC GGCAACCATT GCCTTTGCCC 

701 CGCTCGGTGT CAAAACCGCC CACAAACTTT CTTCTGCCAA ACTCAAAAAA 

751 TC.TTCGGCA TTATGTTGCT TTTGATTGCC GGAAAAATGC TGTACAACCT 

801 GCTTTAA 

This corresponds to the amino acid sequence <SEQ ID 88; ORF17-l>: 

1 MWHWDIILIL LAVGSAAGF I AG LFGVGGGT LIVPWLWV L DLQGLAQHPY 

51 AQHLA VGTSF AVMVFTAFSS ML GQHKKQAV DWKT VFTMMP GMIFGVFTGA 

101 LSAKYIP AFG LQIFFILFLT AVAF KTLHTD PQTASRPLPG LPGLTAVSTL 

151 FGTMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP IALSGAISYL 

201 LNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGV KTA HKLSSAKLKK 

251 X FGIMLLLIA GKMLYNLL * 

Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical H.influenzae transmembrane protein HI0902 (accession number P44070) 
ORF17 and HI0902 proteins show 28% aa identity in 192 aa overlap: 

HKKQAVNGKTVFTMMPGMIFGVFT-GAFSAKYIPAFGLQIF — FILFLTAVAFKTLHTDP 59 
HK + + V + P++ VF GF + +IF +++L ++ D 



Q ++ L L + L G SS GIGGG VPFL G +AIG+S+ + 

QVTTKSLTPLSSVIG-GILIGMASSAAGIGGGGFIVPFLTARGINIKQAIGSSAFCGMLL 189 

ALSGAISYLLNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVXXXXXXXXXXXXXX 17 9 

+SG S++++G +PE SLG++YLPAV ++A + + LG 

GISGMFSFIVSGWGNPLMPEYSLGYIYLPAVLGITATSFFTSKLGASATAKLPVSTLKKG 24 9 

FGIMLLLIAGKM 191 
F + L+++A M 
FALFLIWAINM 261 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF17 shows 96.9% identity over a 196aa overlap with an ORF (ORF17a) from strain A of N. 
meningitidis: 

10 20 30 

orfl7 pep GOHKKQAVNGKT VFTMMPGMI FGVFTGA FS 

I || I I I I I : I I I 1 I I I I I I : I I I I : I I : I 
or f 1 7 a QGLAQHPYAQHLA VGTSFAVMVFTAFSSML GQHKKQAVDWKT VFTMMPGMVFGVFAGA LS 
50 60 70 80 90 100 

40 50 60 70 80 90 

orf 17 . pep AKYIP AFGLQIFFILFLTAVAF KTLHTDPQTASRPLPGLPXLTAVSTLFGTMSSWVGIGG 
I I I I I II I 1 I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I II I I I I I I I 
orf 17a AKYIP AFGLQIFFILFLTAVAF KTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGG 



ORF17 


3 


HI0902 


72 


ORF17 


60 


HI0902 


131 


ORF17 


120 


HI0902 


190 


ORF17 


180 


HI0902 


250 
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110 



120 



130 



140 



150 



160 



100 110 120 130 140 150 

or f 17 . pep GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAV 
I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl7a GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAV 
170 180 190 200 210 220 



orfl7.pep 
orfl7a 



160 170 180 190 

AVLSAATIAFAPLGV KTAHKLSSAKLKKS FGIMLLLIAGKMLYNLL X 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
AVLSAATIAFAPLGV KTAHKLSSAKLKKS FGIMLLLIAGKMLYNLL X 

250 



230 



240 



260 



The complete length ORF17a nucleotide sequence <SEQ ID 89> is: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 



ATGTGGCATT 
AGGTTTTATT 
CTGTCGTTTT 
GCGCAACACC 
CTTTTCCAGT 
CCGTATTTAC 
CTCTCCGCAA 
GTTTTTAACC 
CATCCCGCCC 
TTCGGCACAA 
CCCCTTCTTA 
CATCCGGCCT 
CTCAACGGCC 
TTACCTGCCC 
CGCTCGGTGT 
TCCTTCGGCA 
GCTTTAA 



GGGACATTAT 
GCCGGCCTGT 
ATGGGTGCTT 
TCGCCGTCGG 
ATGCTGGGGC 
GATGATGCCG 
AATATATCCC 
GCCGTCGCAT 
GCTGCCCGGA 
TGTCGAGCTG 
ATCCACTGCG 
TGCCTGGCCG 
TGAATATTGC 
GCCGTCGCCG 
CAAAACCGCC 
TTATGTTGCT 



CTTAATCCTG 
TCGGCGTAGG 
GATTTGCAGG 
CACATCCTTC 
AGCACAAAAA 
GGTATGGTAT 
AGCGTTCGGG 
TCAAAACACT 
CTGCCCGGAC 
GGTCGGCATA 
GCTTCCCCGC 
ATTGCACTCT 
AGGATTGCCC 
TCCTCAGCGC 
CACAAACTTT 
TTTGATTGCC 



CTTGCCGTAG 
CGGCGGCACG 
GTTTGGCACA 
GCCGTCATGG 
ACAGGCGGTC 
TCGGCGTATT 
CTTCAAATTT 
GCATACCGAC 
TGACTGCGGT 
GGCGGCGGTT 
CCATAAAGCC 
CCGGCGCAAT 
GAAGGGTCAC 
GGCAACCATT 
CTTCTGCCAA 
GGAAAAATGC 



GCAGTGCGGC 
CTGATTGTCC 
ACATCCTTAC 
TCTTCACCGC 
GACTGGAAAA 
CGCTGGCGCA 
TCTTCATCCT 
CCTCAGACGG 
TTCCACACTG 
CACTTTCCGT 
ATCGGCACAT 
ATCGTATCTG 
TGGGCTTCCT 
GCCTTTGCCC 
ACTCAAAAAA 
TGTACAACCT 



This encodes a protein having amino acid sequence <SEQ ID 90>: 



l 

51 
101 
151 
201 
251 



MWHWDIILIL LAVGSAAGF I AG LFGVGGGT LIVPWLWV L DLQGLAQHPY 
AQHLAVGTSF AVMVFTAFSS MLGQHKKQAV DWKT VFTMMP GMVFGVFAGA 



LSAKYIP AFG LQIFFILFLT AVAF KTLHTD PQTASRPLPG LPGLTAVSTL 

FGTMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP IALSGAISYL 

LNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGV KTA HKLSSAKLKK 
SFGIMLLLIA GKMLYNLL* 



ORF17a and ORF17-1 show 98.9% identity in 268 aa overlap: 



10 20 30 40 50 60 

orf!7a.pep MWHWDIILILLAVGSAAGFIAGLFGVGGGTLIVPWLWVLDLQGLAQHPYAQHLAVGTSF 
I IN I II II III I I IMIMI I III Ml II 111 M II II I I I I I I II I I I I II I I I I Ml 
orfl7-l MWHWDIILILLAVGSAAGFIAGLFGVGGGTLIVPWLWVLDLQGLAQHPYAQHLAVGTSF 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 17a . pep AVMVFTAFSSMLGQHKKQAVDWKTVFTMMPGMVFGVFAGALSAKYIPAFGLQIFFILFLT 

I M I I I I I I I I I I I I I I I I I I I I I M 1 M I I I : M M : M M I I I I I I I I I M 1 M M I I 
orf 17-1 ' AVMVFTAFSSMLGQHKKQAVDWKTVFTMMPGMIFGVFTGALSAKYIPAFGLQIFFILFLT 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 17a. pep AVAFKTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGGGSLSVPFLIHCGFPAHKA 

II I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
orf 17-1 AVAFKTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVG I GGGSLSVPFL IHCGFPAHKA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 17a. pep IGTSSGLAWP I ALSGAISYLLNGLNIAGLPEGSLGFLYLPAVAVLSAAT I AFAPLGVKTA 

I I I 1 I I I II I I I I M I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I 
orf 17-1 IGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVKTA 

190 200 210 220 230 240 



250 260 269 

orf 17a. pep HKLS SAKLKKS FGIMLLLI AGKMLYNLLX 
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I I I I I I 1 I I I I I I I I I I I I I I I I I I I 1 I 
orf 17-1 HKLS SAKLKKXFGIMLLLIAGKMLYNLLX 

250 260 

Homology with a predicted ORF from N. gonorrhoeae 

ORF17 shows 93.9% identity over a 196aa overlap with a predicted ORF (ORF17.ng) from N. 
gonorrhoeae: 

orf 17 . pep GQHKKQAVNGKTVFTMMPGMI FGVFTGAFS 30 

MUM I I: I I: I: I M I I I I I I I: I I: I 
orfl7ng QGLAQHPYAQHLAVGTS FAVMVFTAFSSMLGQHKKQAVDWKTI FAMMPGMI FGVFAGALS 102 

orf 17 . pep AKYIPAFGLQIFFILFLTAVAFKTLHTDPQTASRPLPGLPXLTAVSTLFGTMSSWVGIGG 90 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I 
orfl7ng AKYIPAFGLQIFFILFLTAVAFKTLHTGRQTASRPLPGLPGLTAVSTLFGAMSSWVGIGG 162 

orf 17. pep GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAV 150 

I II I I 1 I I I I I I I I I I I I I I I I 1 I I I i I I i I I I I I i I I : I I I I I I I I I I I I I I I I I 1 I I I 
orfl7ng GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLVNGLNIAGLPEGSLGFLYLPAV 202 

orf 17 .pep AVLSAATIAFAPLGVKTAHKLSSAKLKKSFGIMLLLIAGKMLYNLL 196 

I f I I I I I I I I I I I I I I I I I I I 1 1 I I I I : I I I I I I I I I II I I I I I I I 
orf 17ng AVLSAATIAFAPLGVKTAHKLSSAKLKESFGIMLLLIAGKMLYNLL 2 68 

An ORF17ng nucleotide sequence <SEQ ID 91> is predicted to encode a protein having amino acid 
sequence <SEQ ID 92>: 

1 MWHWDIILIL LAVGSAAGFI AGLFGVGGGT LIVPWLWVL DLQGLAQHPY 

51 AQHLAVGTSF AVMVFTAFSS MLGQHKKQAV DWKTIFAMMP GMIFGVFAGA 

101 LSAKYIPAFG LQIFFILFLT AVAFKTLHTG RQTASRPLPG LPGLTAVSTL 

151 FGAMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP IALSGAISYL 

201 VNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGVKTA HKLSSAKLKE 

251 SFGIMLLLIA GKMLYNLL* 

Further work revealed the complete gonococcal DNA sequence <SEQ ID 93>: 

1 ATGTGGCATT GGGACATTAT CTTAATCCTG CTTGCcgtag gcAGTGCGGC 

51 AGGTTTTATT GCCGGCCTGT Tcggtgtagg cggcgGTACG CTGATTGTCC 

101 CTGTCGTTTT ATGGGTGCTT GATTTGCAGG GTTTGGCACA ACATCCTTAC 

151 GCGCAACACC TCGCCGTCGG CAcaTccttC gcCGTCATGG TCTTCACCGC 

201 CTTTTCCAGT ATGTTGGGGC AGCACAAAAA ACAGGCGGTC GACTGGAAAA 

251 CCATATTTGC GATGATGCCG GGTATGATAT TCGGCGTATT CGCTGGCGCA 

301 CTCTCCGCAA AATATATCCC CGCGTTCGGG CTTCAAATTT TCTTCATCCT 

351 GTTTTTAACC GCCGTCGCAT TCAAAACACT GCATACCGGT CGTCAGACGG 

401 CATCCCGCCC GCTGCCCGGG CTGCCCGGAC TGACTGCGGT TTCCACACTG 

451 TTCGGCGCAA TGTCGAGCTG GGTCGGCATA GGCGGCGGTT CACTTTCCGT 

501 CCCCTTCTTA ATCCACTGCG GCTTCCCCGC CCATAAAGCC ATCGGCACAT 

551 CATCCGGCCT TGCCTGGCCG ATTGCACTCT CCGGCGCAAT ATCGTATCTG 

601 GTCAACGGTC TGAATATTGC AGGATTGCCC GAAGGGTCGC TGGGCTTCCT 

651 TTACCTGCCC GCCGTCGCCG TCCTCAGCGC GGCAACCATT GCCTTTGCCC 

701 CGCTCGGTGT CAAAACCGCC CACAAACTTT CTTCTGCCAA ACTCAAAGAA 

751 TCCTTCGGCA TTATGTTGCT TTTGATTGCC GGAAAAATGC TGTACAACCT 

801 GCTTTAA 

This corresponds to the amino acid sequence <SEQ ID 94; ORF17ng-l>: 



1 MWHWDIILIL LAVGSAAGF I AG LFGVGGGT LIVPWLWV L DLQGLAQHPY 

51 AQHL AVGTSF AVMVFTAFSS ML GQHKKQAV DWKT IFAMMP GMIFGVFAGA 

101 LSAKYIP AFG LQIFFILFLT AVAF KTLHTG RQTASRPLPG LPGLTAVSTL 

151 FGAMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP IALSGAISYL 

201 VNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGV KTA HKLSSAKLKE 

251 S FGIMLLLIA GKMLYNLL * 

ORF17ng-l and ORF17-1 show 96.6% identity in 268 aa overlap: 



10 20 30 40 50 60 

orf 17-1 . pep MWHWDIILILLAVGSAAGFIAGLFGVGGGTLIVPWLWVLDLQGLAQHPYAQHLAVGTSF 
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I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl7ng-l MWHWDIILILLAVGSAAGFIAGLFGVGGGTLIVPWLWVLDLQGLAQHPYAQHLAVGTSF 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 17-1 . pep AVMVFTAFSSMLGQHKKQAVDWKTVFTMMPGMIFGVFTGALSAKYIPAFGLQIFFILFLT 
It I I I I I I I I I I I I I I I I I I I I I I : I : I I I I I I I I I I : I I I I I I I I I I I I I I I 1 I I I I I I 
orfl7ng-l AVMVFTAFSSMLGQHKKQAVDWKTI FAMMPGMI FGVFAGALSAKYI PAFGLQIFFI LFLT 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 17-1 . pep AVAFKTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGGGSLSVPFLIHCGFPAHKA 
I I I I I I I I I I I I I I II I II I I I I I I I I I I I : I I I I I I I I I 1 I I I I I I I I I I I I I I I I I 
orfl7ng-l AVAFKTLHTGRQTASRPLPGLPGLTAVSTLFGAMSSWVGIGGGSLSVPFLIHCGFPAHKA 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 17-1 . pep IGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVKTA 
I I I I I I I I I I I I I II I I I I I : I I I ! I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl7ng-l IGTSSGLAWPIALSGAISYLVNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVKTA 

190 200 210 220 230 240 



250 260 269 

orf 17-1 . pep HKLSSAKLKKXFGIMLLLIAGKMLYNLLX 
MINIMI: | | I | I I I I I II II I I i I I 
orfl7ng-l HKLSSAKLKESFGIMLLLIAGKMLYNLLX 

250 260 

In addition, ORF17ng-l shows significant homology with a hypothetical H.influenzae protein: 

spl P44070 |Y902__HAEIN HYPOTHETICAL PROTEIN HI0902 pir||G64015 hypothetical protein 
HI0902 - Haemophilus influenzae (strain Rd KW20) gi 1 1573922 (U32772) H. influenzae 
predicted coding region HI0902 [Haemophilus influenzae] Length = 264 

Score = 74 (34.9 bits) f Expect - 1.6e-23, Sum P(2) = 1.6e-23 

Identities = 15/43 (34%), Positives = 23/43 (53%) 

Query: 55 AVGTS FAVMVFTAFS SMLGQHKKQAVDWKT I FAMMPGMI FGVF 97 

A+GTSFA +V T S HK + W+ + + P ++ VF 
Sbjct: 52 ALGTSFATIVITGIGSAQRHHKLGNIVWQAVRILAPVIMLSVF 94 

Score = 195 (91.9 bits), Expect - 1.6e-23, Sum P(2) = 1.6e-23 
Identities = 44/114 (38%), Positives = 65/114 (57%) 

Query: 150 LFGAMSSWVGIGGGSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLVNGLNIAGL 209 

L G SS GIGGG VPFL G +AIG+S+ + +SG S++V+G + 

Sbjct: 148 LIGMASSAAGIGGGGFIVPFLTARGINIKQAIGSSAFCGMLLGISGMFSFIVSGWGNPLM 207 

Query: 210 PEGS LG FL YLPAVAVLS AAT I AFAPLGVKT AHKLS S AKLKES FG IMLLLI AGKM 263 

PE SLG++YLPAV ++A + + LG KL + LK+ F + L+++A M 

Sbjct: 208 PEYSLGYIYLPAVLGITATSFFTSKLGASATAKLPVSTLKKGFALFLIWAINM 261 



This analysis, including the homology with the hypothetical H.influenzae transmembrane protein, 
suggests that the proteins from N. meningitidis and TV [gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 12 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 95>: 



i 

51 
101 
151 
201 
251 



. GGAAACGGAT 
CGTCAGTAAT 
TGCATTATTG 
CTCAAACTTT 
GCTGATGGCG 
CGTCAACGTT 



GGCAGGCAGA 
GTATCGATGA 
CTTTTCGGGA 
ATGCGCTGAA 
GTTGCCTATG 
CGGCGGCTCG 



CCCCGAACAT 
CGCTTGCTTT 
ACGGTTCAAG 
GCCGGTTTAT 
TCCACCGCTG 
CAGCTGCGAC 



CCGCTGCTCG 
TGTCGGAATA 
TGTTTGTGTT 
TGGTTCGTGT 
CGGTATAGAC 
TCGGCGGGTT 



GGCTTTTTGC 
TGTGCGTTGG 
TGCGGCACTG 
TGCAGTTTGT 
CGGCAGCCGC 
GACGGCAGCG 
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301 TTGATGCAGG TCTCGGTACT GGTGCTGCTG CTTTCAGAAA TTGGAAGATA 
351 A 

This corresponds to the amino acid sequence <SEQ ID 96; ORF18>: 

1 . . GNGWQADPEH PLLGLFAVSN VSMTLAFVGI CALVHYCFSG TVQVFVFAAL 
51 LKLYALKPVY WFVLQFVLMA VAYVHRCGID RQPPSTFGGS QLRLGGLTAA 
101 LMQVSVLVLL LSEIGR* 

Further work revealed the complete nucleotide sequence <SEQ ID 97>: 



1 ATGATTTTGC TGCATTTGGA TTTTTTGTCT GCCTTACTGT ATGCGGCGGT 

51 TTTTCTGTTT CTGATATTCC GCGCAGGAAT GTTGCAATGG TTTTGGGCGA 

101 GTATTATGCT GTGGCTGGGC ATATCGGTTT TGGGGGCAAA GCTGATGCCC 

151 GGCATATGGG GAATGACCCG CGCCGCGCCC TTGTTCATCC CCCATTTTTA 

201 CCTGACTTTG GGCAGCATAT TTTTTTTCAT CGGGCATTGG AACCGGAAAA 

251 CAGATGGAAA CGGATGGCAG GCAGACCCCG AACATCCGCT GCTCGGGCTT 

301 TTTGCCGTCA GTAATGTATC GATGACGCTT GCTTTTGTCG GAATATGTGC 

351 GTTGGTGCAT TATTGCTTTT CGGGAACGGT TCAAGTGTTT GTGTTTGCGG 

401 CACTGCTCAA ACTTTATGCG CTGAAGCCGG TTTATTGGTT CGTGTTGCAG 

451 TTTGTGCTGA TGGCGGTTGC CTATGTCCAC CGCTGCGGTA TAGACCGGCA 

501 GCCGCCGTCA ACGTTCGGCG GCTCGCAGCT GCGACTCGGC GGGTTGACGG 

551 CAGCGTTGAT GCAGGTCTCG GTACTGGTGC TGCTGCTTTC AGAAATTGGA 

601 AGATAA 

This corresponds to the amino acid sequence <SEQ ID 98; ORP18-l>: 



1 MILLHLDFLS ALLYAAVFLF LIFRAGMLQW FWASIMLWLG ISVLGAKLMP 
51 GIWGMTRAA P LFIPHFYLTL GSIFFFI GHW NRKTDGNGWQ ADPEHPLLGL 
101 F AVSNVSMTL AFVGICALV H Y CFSGTVQVF VFAALLKL YA L KPVYWFVLQ 
151 FVLMAVAYV H RCGIDRQPPS TFGGSQLRLG GLTAALMQVS VLVLLLS EIG ^ 
201 R* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF18 shows 98.3% identity over a 1 16aa overlap with an ORF (ORF18a) from strain A of N. 
meningitidis: 

10 20 30 

orf 18 . pep GNGWQADPEH PL LGLFA VSNVSMT LA FVG I 

I I I I I I I I I I I I I I i I I I I I t I I I I I I I I I 
orf 18a TRAAP LFIPHFYLTLGSIFFFI GHWNRKTDGNGWQADPEHPLLGLF AVSNVSMTLAFVGI 
60 70 80 90 100 110 



40 50 60 70 80 90 

orf 18. pep CALVHYCFSGTVQVFVFAALLKLYALK PVYWFVLQFVLMAVAYV HRCGIDRQPPSTFGGS 
I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 1 M I I 
orf 18a CALV HY CFSXTVQVFVFAALLKL YALK PVYWFVLQFVLMAVAYV HRCGIDRQPPSTFGGS 
120 130 140 150 160 170 



100 110 
orf 18 . pep QLRLG GLTAALMQVSVLVLLLS EIGRX 
I I I I I II I I I I ! i I I I 1 I I I I I I I II 
orf 18a QLRLG GLTAALMQXSVLVLLLS EIGRX 
180 190 200 

The complete length ORF1 8a nucleotide sequence <SEQ ID 99> is: 



1 ATGATTTTGC TGCATTTGGA TTTTTTGTCT GCCTTACTGT ATGCGGCGGT 

51 TTTTCTGTTT CTGATATTCC GCGCAGGAAT GTTGCAATGG TTTTGGGCGA 

101 GTATTATGCT GTGGCTGGGC ATATCGGTTT TGGGGGCAAA GCTGATGCCC 

151 GGCATATGGG GAATGACCCG CGCCGCGCCC TTGTTCATCC CCCATTTTTA 

201 CCTGACTTTG GGCAGCATAT TTTTTTTCAT CGGGCATTGG AACCGGAAAA 

251 CGGATGGAAA CGGATGGCAG GCAGACCCCG AACATCCTCT GCTCGGGCTG 

301 TTTGCCGTCA GTAATGTATC GATGACGCTT GCTTTTGTCG GAATATGTGC 

351 GTTGGTGCAT TATTGCTTTT CGNGAACGGT TCAAGTGTTT GTGTTTGCGG 

401 CACTGCTCAA ACTTTATGCG CTGAAGCCGG TTTATTGGTT CGTGTTGCAG 
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451 TTTGTGCTGA TGGCGGTTGC CTATGTCCAC CGCTGCGGTA TAGACCGGCA 
501 GCCGCCGTCA ACGTTCGGCG GNTCGCAGCT GCGACTCGGC GGGTTGACGG 
551 CAGCGTTGAT GCAGNTCTCG GTACTGGTGC TGCTGCTTTC AGAAATTGGA 
601 AGATAA 

This encodes a protein having amino acid sequence <SEQ ID 100>: 

1 MILLHLDFLS ALLYAAVFLF LIFRAGMLQW FWAS IMLWLG ISVLGAKLMP 
51 GIWGMTRAAP LFIPHFYLTL G5IFFFI GHW NRKTDGNGWQ ADPEHPLLGL 
101 F AVSNVSMTL AFVGICALV H Y CFSXTVQVF VFAALLKL YA L KPVYWFVLQ 
151 FVLMAVAYV H RCGIDRQPPS TFGGSQLRLG GLTAALMQXS VLVLLLS EIG 
201 R* 

ORF18a and ORF18-1 show 99.0% identity in 201 aa overlap: 



10 20 30 40 50 60 

orf 18a . pep MILLHLDFLSALLYAAVFLFLIFRAGMLQWFWASIMLWLGISVLGAKLMPGIWGMTRAAP 
I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
orf 18-1 MILLHLDFLSALLYAAVFLFLIFRAGMLQWFWASIMLWLGISVLGAKLMPGIWGMTRAAP 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 18a . pep LFIPHFYLTLGSIFFFIGHWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 
I I I I [ I I I I I I I I I I I I I I I I I I I ) I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 18-1 LFIPHFYLTLGSIFFFIGHWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 18a. pep YCFSXTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGSQLRLG 

I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 18-1 YCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGSQLRLG 

130 140 150 160 170 180 

190 200 
orf 18a. pep GLTAALMQXSVLVLLLSEIGRX 

II MINI MM MINIM! 
o r f 1 8 - 1 GLTAALMQVS VLVLLLSE I GRX 

190 200 

Homology with a predicted ORF from ^gonorrhoeae 

ORF18 shows 93.1% identity over a 116aa overlap with a predicted ORF (ORF18.ng) from N. 
gonorrhoeae: 

orf 18 . pep GNGWQADPEHPLLGLFAVSNVSMTLAFVGI 30 

M M M I I M I M M I M I M I I M M M I 
orfl8ng TRAAPLFIPHFYLTLGSIFFFIGYWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGI 115 

orf 18 . pep CALVHYCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGS 90 

I | I I I I I I I I I I I II I I I M II I I I I II I I I I I I I I II I I I I II I II I I I M I I I I I I II 
orfl8ng CALVHYCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGS 175 



QLRLGGLT AALMQVS VLVLLLSE IGR 116 
Mill 1:1 IIIIM ::II:IMI 
QLRLGVLAAMLMQVAVTAMLLAEIGR 201 



orflS.pep 
orf 18ng 

The complete length ORF18ng nucleotide sequence is <SEQ ID 101>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



ATGATTTTGC 
tttTctgTTT 
GTATTGCGTT 
GGGATGTGGG 
CCTGACTTTG 
CAGATGGAAA 
TTTGCCGTCA 
GTTGGTGCAT 
CATTGCTCAA 
TTTGTATTGA 
GCCGCCGTCA 



TGCATTTGGA 
CTGATATTCC 
GTGGCTCGGC 
GAATGACCCG 
GGCAGCATAT 
CGGATGGCAG 
GTAATGTATC 
TATTGCTTTT 
ACTTTATGCG 
TGGCGGttgC 
ACGTTCGGCG 



TTTTTTGTCT 
GCGCAGGAAT 
ATCTCGGTTT 
CGCCGCGCCT 
TTTTTTTCAT 
GCAGACCCCG 
GATGACGCTT 
CGGGAACGGT 
CTGAAGCCGG 
CTATGTCCAC 
GTTCGCAGCT 



GCCTTACTGt 
GTTGCAATGG 
TAGGGGTAAA 
TTGTTCATCC 
CGGGTATTGG 
AACATCCGCT 
GCTTTTGTCG 
TCAAGTGTTT 
TTTATTGGTT 
CGCTGCGGTA 
GCGACTCGGC 



aTGCGGcggt 
TTTTGGGCGA 
GCTGATGCCG 
CCCATTTTTA 
AACCGGAAAA 
GCTCGGGCTT 
GAATATGTGC 
GTGTTTGCGG 
CGTGTTGCAG 
TAGACCGGCA 
GTGTTGGCGG 
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551 CGATGTTGAT GCAGGTTGCG GTAACGGCGA TGCTGCTTGC CGAAATCGGC 
601 AGATGA 

This encodes a protein having amino acid sequence <SEQ ID 102>: 

1 MILLHLDFLS ALLYAAVFLF LIFRAGMLQW FWASIALWLG ISVLGVKLMP 
51 GMWGMTRAAP LFIPHFYLTL GSIFFFI GYW NRKTDGNGWQ ADPEHPLLGL 
101 FAV SNVSMTL AFVGICALV H Y CFSGTVQVF VFAALLKL YA LKP VYWFVLQ 
151 FVLMAVAYV H RCGIDRQPPS TFGGSQLRLG VLAAMLMQVA VTAMLLA EIG 
201 R* 

This ORF18ng protein sequence shows 94.0% identity in 201 aa overlap with ORF18-1: 

10 20 30 40 50 60 

or f 18-1- pep MILLHLDFLSALLYAAVFLFLIFRAGMLQWFWASIMLWLGISVLGAKLMPGIWGMTRAAP 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II E I I I I I : I I I I I : I I I j I I I I 
orfl8ng MI LLHLDFLSALLYAAVFLFL I FRAGMLQW FWASIALWLG I SVLGVKLMPGMWGMTRAAP 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 18-1. pep LFIPHFYLTLGSIFFFIGHWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 

II I I II I I I M Ml II I !:M IMI MMMII Nil I I I I I I III] I I II I I II I Ml I 
orfl8ng LFIPHFYLTLGSIFFFIGYWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 18-1 . pep YCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGSQLRLG 
I I II I I I I I II I I I I I I I I I I I I I M I I I I I I I i I I II II I I I I I II I I I I I I I I I I M I 
orfl8ng YCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGSQLRLG 

130 140 150 160 170 180 



190 200 
0 r f 1 8 - 1 . pep GLTAALMQVS VLVLLLSE IGRX 
1:1 1111:1 ::i!:IMII 
or f 1 8ng VLAAMLMQVAVTAMLLAEIGRX 

190 200 



Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein, it is predicted that the proteins from N .meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 13 

The following partial DNA sequence was identified in ^meningitidis <SEQ ID 103>: 



1 ATGAAAACCC CACTCCTCAA GCCTCTGCTN ATTACCTCGC TTCCCGTTTT 

51 CGCCAGTGTT TTTACCGCCG CCTCCATCGT CTGGCAGCTA GGCGAACCCA 

101 AGCTCGCCAT GCCCTTCGTA CTCGGCATCA TCGCCGGCGG CCTTGTCGAT 

151 TTGGACAACC NCNTGACCGG ACGGCTNAAA AACATCATCA CCACCGTCGC 

201 CCTGTTCACC CTCTCCTCGC TCACGGCACA AAGCACCCTC GGCACAGGGC 

251 TGCCCTTCAT CCTCGCCATG ACCCTGATGA CTT.CG.CTT CACCATTTTA 

301 GGCGCGGNCG . . . 

This corresponds to the amino acid sequence <SEQ ID 104; ORF19>: 



1 MKTPLLKPLL ITSLPVFASV FTAASIVWQL GEPKLAMPFV LGIIAGGLVD 

51 LDNXXTGRLK NIITTVALFT LSSLTAQSTL GTGLPFILAM TLMTXXFTIL 

101 GAX. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 105>: 

1 ATGAAAACCC CACTCCTCAA GCCTCTGCTC ATTACCTCGC TTCCCGTTTT 

51 CGCCAGTGTT TTTACCGCCG CCTCCATCGT CTGGCAGCTA GGCGAACCCA 

101 AGCTCGCCAT GCCCTTCGTA CTCGGCATCA TCGCCGGCGG CCTTGTCGAT 

151 TTGGACAACC GCCTGACCGG ACGGCTGAAA AACATCATCA CCACCGTCGC 
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201 CCTGTTCACC CTCTCCTCGC TCACGGCACA AAGCACCCTC GGCACAGGGC 

251 TGCCCTTCAT CCTCGCCATG ACCCTGATGA CCTTCGGCTT CACCATTTTA 

301 GGCGCGGTCG GGCTCAAATA CCGCACCTTC GCCTTCGGTG CACTCGCCGT 

351 CGCCACCTAC ACCACACTTA CCTACACCCC CGAAACCTAC TGGCTGACCA 

401 ACCCCTTCAT GATTTTATGC GGCACCGTAC TGTACAGCAC CGCCATCCTC 

451 CTGTTCCAAA TCGTCCTGCC CCACCGCCCC GTCCAAGAAA GCGTCGCCAA 

501 CGCCTACGAC GCACTCGGCG GCTACCTCGA AGCCAAAGCC GACTTCTTCG 

551 ACCCCGATGA GGCAGCCTGG ATAGGCAACC GCCACATCGA CCTCGCCATG 

601 AGCAACACCG GCGTCATCAC CGCCTTCAAC CAATGCCGTT CCGCCCTGTT 

651 TTACCGCCTT CGCGGCAAAC ACCGCCACCC GCGCACCGCC AAAATGCTGC 

701 GTTACTACTT TGCCGCCCAA GACATACACG AACGCATCAG CTCCGCCCAC 

751 GTCGATTATC AGGAAATGTC CGAAAAATTC AAAAACACCG ACATCATCTT 

801 CCGCATCCAC CGCCTGCTCG AAATGCAGGG ACAAGCCTGC CGCAACACCG 

851 CCCAAGCCCT GCGCGCAAGC AAAGACTACG TTTACAGCAA ACGCCTCGGC 

901 CGCGCCATCG AAGGCTGCCG CCAATCGCTG CGCCTCCTTT CAGACAGCAA 

951 CGACAGTCCC GACATCCGCC ACCTGCGCCG CCTTCTCGAC AACCTCGGCA 

1001 GCGTCGACCA GCAGTTCCGC CAACTCCAGC ACAACGGCCT GCAGGCAGAA 

1051 AACGACCGCA TGGGCGACAC CCGCATCGCC GCCCTCGAAA CCAGCAGCCT 

1101 CAAAAACACC TGGCAGGCAA TCCGTCCGCA GCTAAACCTC GAATCAGGCG 

1151 TATTCCGCCA TGCCGTCCGC CTGTCCCTCG TCGTTGCCGC CGCCTGCACC 

1201 ATCGTCGAAG CCCTCAACCT CAACCTCGGC TACTGGATAC TACTGACCGC 

1251 CCTTTTCGTC TGCCAACCCA ACTACACCGC CACCAAAAGC CGCGTCCGCC 

1301 AGCGCATCGC CGGCACCGTA CTCGGCGTAA TCGTCGGCTC GCTCGTCCCC 

1351 TACTTCACCC CGTCTGTCGA AACCAAACTC TGGATTGTCA TCGCCAGTAC 

1401 CACCCTCTTT TTCATGACCC GCACCTACAA ATACAGTTTC TCCACCTTCT 

1451 TCATTACCAT TCAAGCCCTG ACCAGCCTCT CCCTCGCAGG TTTGGACGTA 

1501 TACGCCGCCA TGCCCGTACG CATCATCGAC AC CAT TAT CG GCGCATCCCT 

1551 TGCCTGGGCG GCAGTCAGCT ACCTGTGGCC AGACTGGAAA TACCTCACGC 

1601 TCGAACGCAC CGCCGCCCTT GCCGTATGCA GCAACGGTGC CTATCTCGAA 

1651 AAAATCACCG AACGCCTCAA AAGCGGCGAA ACCGGCGACG ACGTCGAATA 

1701 CCGCGCCACC CGCCGCCGCG CCCACGAACA CACCGCCGCC CTCAGCAGCA 

1751 CCCTTTCCGA CATGAGCAGC GAACCCGCAA AATTCGCCGA CAGCCTGCAA 

1801 CCCGGCTTTA CCCTGCTCAA AACCGGCTAC GCCCTGACCG GCTACATCTC 

1851 CGCCCTCGGC GCATACCGCA GCGAAATGCA CGAAGAATGC AGCCCCGACT 

1901 TTACCGCACA GTTCCACCTC GCCGCCGAAC ACACCGCCCA CATCTTCCAA 

1951 CACCTGCCCG AAACCGAACC CGACGACTTT CAGACAGCAC TGGATACACT 

2001 GCGCGGCGAA CTCGACACCC TCCGCACCCA CAGCAGCGGA ACACAAAGCC 

2051 ACATCCTCCT CCAACAGCTC CAACTCATCG CCCGACAGCT CGAACCCTAC 

2101 TACCGCGCCT ACCGCCAAAT TCCGCACAGG CAGCCCCAAA ATGCAGCCTG 

2151 A 

This corresponds to the amino acid sequence <SEQ ID 106; ORF19-l>: 

1 MKTPLLKPLL ITSLPVFASV FT AASIVWQL GEPK LAMPFV LGIIAGGLVD 

51 LDNRLTGRLK NIITTVALFT LSSLTAQSTL GTGLP FILAM TLMTFGFTIL 

101 GAVGLKYRTF AFGALAVATY TTLTYTPETY WLTNP FMILC GTVLYSTAIL 

151 LFQIVLPHRP VQESVANAYD ALGGYLEAKA DFFDPDEAAW IGNRHIDLAM 

201 SNTGVITAFN QCRSALFYRL RGKHRHPRTA KMLRYYFAAQ DIHERISSAH 

251 VDYQEMSEKF KNTDIIFRIH RLLEMQGQAC RNTAQALRAS KDYVYSKRLG 

301 RAIEGCRQSL RLLSDSNDSP DIRHLRRLLD NLGSVDQQFR QLQHNGLQAE 

351 NDRMGDTRIA ALETSSLKNT WQAIRPQLNL ESGVFRHAVR LSLWAAACT 

401 IVEALNLNLG YWILLTALFV CQPNYTATKS RVRQR IAGTV LGVIVGSLVP 

451 YFTPSVETKL WIVIASTTLF FMTRTYKYSF STFFITIQAL TSLSLAGLDV 

501 YAAMPVRIID TIIGASLAWA AVSYLWPDWK YLTLERTAAL AVCSNGAYLE 

551 KITERLKSGE TGDDVEYRAT RRRAHEHTAA LSSTLSDMSS EPAKFADSLQ 

601 PGFTLLKTGY ALTGYISALG AYRSEMHEEC SPDFTAQFHL AAEHTAHIFQ 

651 HLPETEPDDF QTALDTLRGE LDTLRTHSSG TQSHILLQQL QLIARQLEPY 

701 YRAYRQIPHR QPQNAA* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with predicted transmenbrane protein YHFK of H, influenzae (accession number P44289) 
ORF19 and YHFK proteins show 45% aa identity in 97 aa overlap: 

orfl9 6 LKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNXXTGRLKNIITT 65 

L +I+++PVF +V AA +W +MP -fLGIIAGGLVDLDN TGRLKN+ T 

YHFK 5 LNAKVISTIPVFIAVNIAAVGIWFFDISSQSMPLILGIIAGGLVDLDNRLTGRLKNVFFT 64 
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orfl9 66 VALFTLSSLTAQSTLGTGLPFILAMTLMTXXFTILGA 102 

+ F++SS Q +G + +1+ MT++T FT++GA 
YHFK 65 LIAFSISSFIVQLHIGKPIQYIVLMTVLTFIFTMIGA 101 

Homology with a predicted ORF from Mmeninzitidis (strain A) 

ORF19 shows 92.2% identity over a 102aa overlap with an ORF (ORF19a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 19 .pep MKTPLLKPLLITSLPVFASVFTA ASIVWQLGEP KLAMPFVLGIIAGGLVDL DNXXTGRLK 
Ml II I I I I I I M II M I III I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I 
or f 19a MKTPPLKPLLITSLPVFASVFT AASIVWQLGEP KIjAMPFVLGIIAGGLVDL DNRLTGRLK 

10 20 30 40 50 60 



70 80 90 100 

orf 19 .pep NIITTVALFTLSSLTAQSTLGTGLP FILAMTLMTXXFTILGAX 

I I I : I I I I I I I I I I : I I I I I I I I I I I i I I I I I i I 111:11 
orf 19a NIIATVALFTLSSLVAQSTLGTGLP FILAMTLMTFGFTIMGAV GLKYRTFAFGALAVATY 

70 80 90 100 110 120 



orf 1 9a TTLTYTPETYWLTNP FMI LCGTVLYSTAI I LF QI ILPHRPVQENVANAYEALGSYLEAKA 

130 140 150 160 170 180 

The complete length ORF19a nucleotide sequence <SEQ ID 107> is: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 



ATGAAAACCC 
CGCCAGTGTC 
AGCTCGCCAT 
TTGGACAACC 
CCTGTTCACC 
TGCCATTCAT 
GGCGCGGTCG 
CGCCACCTAC 
ACCCCTTTAT 
CTGTTCCAAA 
CGCCTACGAA 
ATCCCGACGA 
AGCAACACCG 
TTACCGCCTT 
GCTACTACTT 
GTCGACTACC 
CCGCATCCAC 
CCCAAGCCCT 
CGCGCCATCG 
CGACAATCCC 
GCGTCGACCA 
AACGACCGCA 
CAAAAACACC 
TATTCCGCCA 
ATCGTCGAAG 
CCTTTTCGTC 
AGCGCATCGC 
TACTTTACCC 
CACCCTCTTT 
TCATCACCAT 
TACGCCGCCA 
TGCCTGGGCG 
TCGAACGCAC 
AAAATCACCG 
CCGCGCCACC 
CCCTTTCCGA 
CCCGGCTTTA 
CGCCCTCGGC 
TTACCGCACA 
CACCTGCCCG 
GCGCGGCGAA 
ACATCCTCCT 
TACCGCGCCT 
A 



CACCCCTCAA 
TTTACCGCCG 
GCCCTTCGTA 
GCCTGACCGG 
CTCTCCTCAC 
CCTCGCCATG 
GGCTGAAATA 
ACCACACTTA 
GATTCTGTGC 
TCATCCTGCC 
GCACTCGGCA 
AGCCGAATGG 
GCGTCATCAC 
CGCGGCAAAC 
CGCCGCCCAA 
AAGAGATGTC 
CGCCTGCTCG 
GCGCGCAAGC 
AAGGCTGCCG 
GACATCCGCC 
GCAGTTCCGC 
TGGGCGACAC 
TGGCAGGCAA 
TGCCGTCCGC 
CCCTCAACCT 
TGCCAACCCA 
CGGCACCGTA 
CCTCCGTCGA 
TTCATGACCC 
TCAAGCCCTG 
TGCCCGTACG 
GCAGTCAGCT 
CGCCGCCCTT 
AACGCCTCAA 
CGCCGCCGCG 
CATGAGCAGC 
CCCTGCTCAA 
GCATACCGCA 
GTTCCACCTC 
AAACCGAACC 
CTCGACACCC 
CCAACAGCTC 
ACCGACAAAT 



GCCTCTGCTC 
CCTCCATCGT 
CTCGGCATCA 
ACGGCTGAAA 
TTGTCGCGCA 
ACCCTGATGA 
CCGCACCTTC 
CCTACACCCC 
GGAACCGTAC 
CCACCGCCCC 
GCTACCTCGA 
ATAGGCAACC 
CGCCTTCAAC 
ACCGCCACCC 
GACATACACG 
CGAAAAATTC 
AAATGCAGGG 
AAAGACTACG 
CCAATCGCTG 
ACCTGCGCCG 
CAACTCCAGC 
CCGCATCGCC 
TCCGTCCGCA 
CTGTCCCTTG 
CAACCTCGGC 
ACTACACCGC 
CTCGGCGTAA 
AACCAAACTC 
GCACCTACAA 
ACCAGCCTCT 
CATCATCGAC 
ACCTGTGGCC 
GCCGTATGCA 
AAGCGGCGAA 
CCCACGAACA 
GAACCCGCAA 
AACCGGCTAC 
GCGAAATGCA 
GCCGCCGAAC 
CGACGACTTT 
TCCGCACCCA 
CAACTCATCG 
TCCGCACAGG 



ATTACCTCGC 
CTGGCAGCTG 
TCGCTGGCGG 
AACATCATCG 
AAGCACCCTC 
CTTTCGGCTT 
GCCTTCGGCG 
CGAAACCTAC 
TGTACAGCAC 
GTTCAAGAAA 
AGCCAAAGCC 
GCCACATCGA 
CAATGCCGTT 
GCGCACCGCC 
AACGCATCAG 
AAAAACACCG 
ACAAGCCTGC 
TTTACAGCAA 
CGCCTCCTTT 
CCTTCTCGAC 
ACAACGGCCT 
GCCCTCGAAA 
GCTAAACCTC 
TCGTTGCCGC 
TACTGGATAC 
CACCAAAAGC 
TCGTCGGCTC 
TGGATCGTCA 
ATACAGCTTC 
CCCTCGCAGG 
ACCATTATCG 
AGACTGGAAA 
GCAACGGCGC 
ACCGGCGACG 
CACCGCCGCC 
AATTCGCCGA 
GCCCTGACCG 
CGAAGAATGC 
ACACCGCCCA 
CAGACAGCAC 
CAGCAGCGGA 
CCCGGCAGCT 
CAGCCCCAAA 



TTCCCGTTTT 
GGCGAACCCA 
CCTGGTCGAT 
CCACCGTCGC 
GGCACAGGTT 
TACCATCATG 
CACTCGCCGT 
TGGCTGACCA 
CGCCATCATC 
ACGTCGCCAA 
GACTTTTTCG 
CCTCGCCATG 
CCGCCCTGTT 
AAAATGCTGC 
CTCCGCCCAC 
ACATCATCTT 
CGCAACACCG 
ACGCCTCGGC 
CAGACAGCAA 
AACCTCGGCA 
GCAGGCAGAA 
CCGGCAGCCT 
GAATCAGGCG 
CGCCTGCACC 
TACTGACCGC 
CGCGTCCGCC 
GCTCGTCCCC 
TCGCCAGTAC 
TCGACATTTT 
GTTGGACGTA 
GCGCATCCCT 
TACCTCACGC 
CTATCTCGAA 
ACGTCGAATA 
CTCAGCAGCA 
CAGCCTGCAA 
GCTACATCTC 
AGCCCCGACT 
CATCTTCCAA 
TGGATACACT 
ACACAAAGCC 
CGAACCCTAC 
ACGCAGCCTG 
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This encodes a protein having amino acid sequence <SEQ ID 108>: 

1 MKTPPLKPLL ITSLPVFASV FT AASIVWQL GEP KLAMPFV LGIIAGGLVD 

51 LDNRLTGRLK NIIATVALFT LSSLVAQSTL GTGLPF ILAM TLMTFGFTIM 

101 GAVGLKYRTF AFGALAVATY TTLTYTPETY WLTNP FMILC GTVLYSTAII 

151 LFQIILPHRP VQENVANAYE ALGSYLEAKA DFFDPDEAEW IGNRHIDLAM 

201 SNTGVITAFN QCRSALFYRL RGKHRHPRTA KMLRYYFAAQ DIHERISSAH 

251 VDYQEMSEKF KNTDIIFRIH RLLEMQGQAC RNTAQALRAS KDYVYSKRLG 

301 RAIEGCRQSL RLLSDSNDNP DIRHLRRLLD NLGSVDQQFR QLQHNGLQAE 

351 NDRMGDTRIA ALETGSLKNT WQAIRPQLNL ESGVFRHAVR LSLWAAACT 

401 IVEALNL NLG YWILLTALFV CQPNYTATKS RVRQR IAGTV LGVIVGSLVP 

451 YFTPSVETKL WIVIASTTLF FMTRTYKYSF STFFITIQAL TSLSLAGLDV 

501 YAAMPVRIID TIIGASLAWA AVSYLWPDWK YLTLERTAAL AVCSNGAYLE 

551 KITERLKSGE TGDDVEYRAT RRRAHEHTAA LSSTLSDMSS EPAKFADSLQ 

601 PGFTLLKTGY ALTGYISALG AYRSEMHEEC SPDFTAQFHL AAEHTAHIFQ 

651 HLPETEPDDF QTALDTLRGE LDTLRTHSSG TQSHILLQQL QLIARQLEPY 

701 YRAYRQIPHR QPQNAA* 

ORF19a and ORF19-1 show 98.3% identity in 716 aa overlap; 

10 20 30 40 50 60 

orfl9a.pep ' MKTPPLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 
I | 1 | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I t I I I I I 
orfl9-l MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 19a . pep NIIATVALFTLSSLVAQSTLGTGLPFILAMTLMTFGFTIMGAVGLKYRTFAFGALAVATY 
I M : I I I I I I I I I I : I I I I I I 1 I I I I i I I I I I I I I I I I I : I I I I I I I I I I I i I I I I I I I I 
orf 19-1 NI ITTVALFTLS SLTAQSTLGTGLPFI LAMTLMT FGFT I LGAVGLKYRTFAFGALAVAT Y 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 19a . pep TTLTYTPETYWLTNPMILCGTVLYSTAIILFQIILPHRPVQENVANAYEALGSYLEAKA 

I I I I I I I II I I ( I I I M I I I I I I M I I M : I I I I : I I I 1 I I I I : I 1 I I I : I I I : I 1 I I I I 
orf 19-1 TTLTYTPETYWLTNPFMILCGTVLYSTAILLFQIVLPHRPVQESVANAYDALGGYLEAKA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 19a. pep DFFDPDEAEW I GNRH I DLAMSNTGV I TAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 

I I I I I I I I I I I i I M I I I I I I I I I I I I I I I I I I I I I I M I I I I I i I I I I I I I I I I I M I 
orf 19-1 DFFDPDEAAWIGNRHIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 19a. pep DIHERISSAHVDYQEMSEKFKNTDIIFRIHRLLEMQGQACRNTAQALRASKDYVYSKRLG 

I i i m 1 1 i I m i 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 i i 1 1 1 1 1 i i 1 1 1 i 1 1 1 1 i i 1 1 1 m i i i i i 1 1 

orf 19-1 DIHERISSAHVDYQEMSEKFKNTDIIFRIHRLLEMQGQACRNTAQALRASKDYVYSKRLG 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 19a. pep RAIEGCRQSLRLLSDSNDNPDIRHLRRLLDNLGSVDQQFRQLQHNGLQAENDRMGDTRIA 
I I i I I I I I I I I I I I I I I I : I I I I I I I I I I i I I I I I ( I I II I I I I I I I I I I I I I I 1 I M I I 
orf 19-1 RAIEGCRQSLRLLSDSNDSPDIRHLRRLLDNLGSVDQQFRQLQHNGLQAENDRMGDTRIA 

310 320 330 340 350 360 

370 380 390 400 410 420 

or f 1 9a . pep ALETGSLKNTWQAIRPQLNLESGVFRHAVRLSLVVAAACTIVEALNLNLGYWILLTALFV 

I | | | : | | | I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I 
orf 19-1 ALET S S LKNTWQAIRPQLNLESGVFRHAVRLS LWAAACT IVEALNLNLG YW I LLTAL FV 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 19a. pep CQPNYTATKSRVRQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIASTTLFFMTRTYKYSF 

II I I I M I I I IN I M I Ml I I M II M II Ml Mill I I Ml II I I I I I II 

orf 19-1 CQPNYTATKSRVRQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIASTTLFFMTRTYKYSF 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf 19a. pep STFFITIQALTSLSLAGLDVYAAMPVRIIDTIIGASLAWAAVSYLWPDWKYLTLERTAAL 
| M I I I i I I I I I I M I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
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orfl9-l STFFITIQALTSLSLAGLDVYAAMPVRIIDTIIGASLAWAAVSYLWPDWKYLTLERTAAL 

490 500 510 520 530 540 

550 560 570 580 590 600 

or f 19a . pep AVCSNGAYLEKITERLKSGETGDDVEYRATRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 
I || I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl9-l AVCSNGAYLEKITERLKSGETGDDVEYRATRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 

550 560 570 580 590 600 



610 620 630 640 650 660 

orf 1 9a . pep PGFTLLKTGYALTGYISALGAYRSEMHEECSPDFTAQFHLAAEHTAHIFQHLPETEPDDF 
I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I i I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 19-1 PGFTLLKTGYALTGYISALGAYRSEMHEECSPDFTAQFHLAAEHTAHIFQHLPETEPDDF 

610 620 630 640 . 650 660 

670 680 690 700 710 

orf 19a. pep QTALDTLRGELDTLRTHSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I II I I I I I I I I 
orf 19-1 QTALDTLRGELDTLRTHSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 

670 680 690 700 710 

Homology with a predicted ORF from N gonorrhoeae 

ORF 19 shows 95.1% identity over a 102aa overlap with a predicted ORF (ORF19.ng) from N. 



gonorrhoeae: 

orf 19 . pep MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNXXTGRLK 60 

I I I 1 I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Mill 
orfl9ng MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 60 

orf 19. pep N 1 1 T T V AL FT L S S LT AQS T LGTGL P FI L AMT LMTXX FT I LG AX 103 

I I I : I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 
orfl9ng N 1 1 ATVALFTLS S LT AQSTLGTGL P FI LAMTLMTFG FT I LGAVGLKYRT FAFGALAVAT Y 120 

An ORF19ng nucleotide sequence <SEQ ID 109> is predicted to encode a protein having amino 
acid sequence <SEQ ID 1 10>: 



1 

51 
101 
151 
201 
251 
301 
351 



MKTPLLKPLL 
LDNRLTGRLK 
GAVGLKYRTF 
LFQIILPHRP 
SNTGVITAFN 
VDYQEMSEKF 
RAIEGCRQSL 
NDRMGDTRIA 



ITSLPVFASV FTAASIVWQL GEPKLAMPFV LGI IAGGLVD 
NIIATV ALFT LSSLTAQSTL GTGLPFILAM 
AFGALAVATY TTLTYTPETY WLTNPFMILC 



VQESVAN AYE ALGGYLEAKA DFFDP DEAAW 
QCRSALFYRL RGKHRHPRTA KMLRYYFAAQ 
KNTDIIFRIR RLLEMQGQAC RNTAQAIRSG 
RLLSDGNDSP DIRHLSRLLD NLGSVDQQFR 
ALETGSFKNT * 



TLMTFGFTIL 
GTVLYSTAI I 
IGNRHIDLAM 
DIHERISSAH 
KDYVYSKRLG 
QLRHSDSPAE 



Further work revealed the complete nucleotide sequence <SEQ ID 1 1 1>: 



1 ATGAAAACCC CACTCCTCAA 

51 CGCCAGTGTC TTTAGCGCCG 

101 AGCTCGCCAT GCCCTTCGTA 

151 TTGGACAACC GCCTGACCGG 

201 CCTGTTTACC CTCTCCTCGC 

251 TGCCCTTCAT CCTCGCCATG 

301 GGCGCGGTCG GGCTGAAATA 

351 CGCCACCTAC ACCACGCTTA 

401 ACCCCTTCAT GATTTTATGC 

451 CTGTTCCAAA TCATCCTGCC 

501 TGCCTACGAA GCACTCGGCG 

551 ACCCCGATGA GGCAGCCTGG 

601 AGCAACACCG GCGTCATCAC 

651 TTACCGTTTG CGCGGCAAAC 

701 GCTACTACTT CGCCGCCCAA 

751 GTCGACTACC AAGAGATGTC 

801 CCGCATCCGC CGCCTGCTCG 

851 CCCAAGCCAT CCGGTCGGGC 

901 CGCGCCATcg aaggctgCCG 

951 CGACAGTCCC GACATCCGCC 



GCCTCTGCTC ATTACCTCGC TTCCCGTTTT 
CCTCCATCGT CTGGCAGCTA GGCGAACCCA 
CTCGGCATCA TCGCCGGCGG CCTGGTCGAT 
ACGGCTGAAA AACATCATCG CCACCGTCGC 
TCACGGCGCA AAGCACCCTC GGCACAGGGC 
ACCCTGATGA CCTTCGGCTT TACCATTTTA 
CCGCACCTTC GCCTTCGGCG CACTCGCCGT 
CCTACACCCC CGAAACCTAC TGGCTGACCA 
GGCACCGTAC TGTACAGCAC CGCCATCATC 
CCACCGCCCC GTCCAAGAAA GCGTCGCCAA 
GCTACCTCGA AGCCAAAGCC GACTTCTTCG 
ATAGGCAACC GCCACATCGA CCTCGCCATG 
CGCCTTCAAC CAATGCCGTT CCGCCCTGTT 
ACCGCCACCC GCGCACCGCC AAAATGCTGC 
GACATCCACG AACGCATCAG CTCCGCCCAC 
CGAAAAATTC AAAAACACCG ACATCATCTT 
AAATGCAGGG GCAGGCGTGC CGCAACACCG 
AAAGACTAcg tTTACAGCAA ACGCCTCGGA 
CCAGTCGCtg cgcctCCTTt cagacggcaA 
ACCTGAGccg CCTTCTCGAC AACCTCGgca 
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1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 



GCGTcgacca 
Aacgaccgca 
caaaaaCAcc 
TATTCCGCCA 
ATCGTCgaag 
CCTTTTCGTC 
AACGCATCGC 
TACTTCACCC 
CACCCTGTTC 
TCATCACCAT 
TACGCCGCCA 
TGCCTGGGCG 
TCGAACGCAC 
AAAATTGCCG 
CCGCATCACC 
CCCTTTCCGA 
CCCGGCTTTA 
CGCCCTCGGC 
TTACCGCACA 
CACCTGCCCG 
GCGCGGCGAA 
ACATCCTCCT 
TACCGCGCCT 
A 



gcagtTCcgc 
tgggcgacaC 
tggcaggCAA 
TGCCGTCCGC 
cCCTCAACCT 
TGCCAACCCA 
CGGCACCGTA 
CCTCCGTCGA 
TTCATGACCC 
TCAGGCACTG 
TGCCCGTGCG 
GCGGTCAGCT 
CGCCGCCCTT 
AACGCCTCAA 
CGCCGCCGCG 
CATGAGCAGC 
CCCTGCTCAA 
GCATACCGCA 
GTTCCACCTT 
ACATGGGACC 
CTCGGCACCC 
CCAACAGCTC 
ACCGACAAAT 



caactCCGAC 
CCGCATCGCC 
TCCGTCCGCa 
CTGTCCCTCG 
CAACCTCGGC 
ACTACACCGC 
CTCGGCGTAA 
AACCAAACTC 
GCACCTACAA 
ACCAGCCTCT 
CATCATcgaC 
ACCTGTGGCC 
GCCGTATGCA 
AACCGGCGAA 
CCCACGAACA 
GAACCCGCAA 
AACCGGCTAC 
GCGAAATGCA 
GCCGCCGAAC 
CGACGACTTT 
TCCGCACCCG 
CAACTCATCG 
TCCGCACAGG 



ACAgcgactC 
GCCCtcgaaa 
gctgaaCCTC 
TCGTTGCCGC 
TACTGGATAC 
CACCAAAAGC 
TCGTCGGCTC 
TGGATTGTCA 
ATACAGTTTC 
CCCTCGCAGG 
ACCATTATCG 
AGACTGGAAA 
GCAGCGGCAC 
ACCGGCGACG 
CACCGCCGCC 
AATTCGCCGA 
GCCCTGACCG 
CGAAGAATGC 
ACACCGCCCA 
CAGACGGCAT 
CAGCAGCGGA 
CccgGCAACT 
CAGCCCCAAA 



CCCCGCcgaa 
ccggcagctT 
GAATCatgCG 
CGCCTGCACC 
TGCTGACCGC 
CGCGTGTACC 
GCTCGTCCCC 
TCGCCGGTAC 
TCCACCTTCT 
TTTGGACGTA 
GCGCATCCCT 
TACCTCACGC 
ATACCTCCAA 
ACATAGAATA 
CTCAGCAGCA 
CAGCCTGCAA 
GCTACATCTC 
AGCCCCGACT 
CATCTTCCAA 
TGGATACACT 
ACACAAAGCC 
CGAACCCTAC 
ACGCAGCCTG 



This corresponds to the amino acid sequence <SEQ ID 1 12; ORF19ng-l>: 



1 MKTPLLKPLL ITSLPVFASV FTA ASIVWQL GEPK LAMPFV LGIIAGGLVD 

51 LDNRLTGRLK NIIATVALFT LSSLTAQSTL GTGLPF ILAM TLMTFGFTIL 

101 GAVGLKYRTF AFGALAVATY TTLTYTPETY WLTNP FMILC GTVLYSTAII 

151 LFQIILPHRP VQESVANAYE ALGGYLEAKA DFFDPDEAAW IGNRHIDLAM 

201 SNTGVITAFN QCRSALFYRL RGKHRHPRTA KMLRYYFAAQ DIHERISSAH 

251 VDYQEMSEKF KNTDIIFRIR RLLEMQGQAC RNTAQAIRSG KDYVYSKRLG 

301 RAIEGCRQSL RLLSDGNDSP DIRHLSRLLD NLGSVDQQFR QLRHSDSPAE 

351 NDRMGDTRIA ALETGSFKNT WQAIRPQLNL ESCVFRHAVR LSLWAAACT 

401 IVEALNLNLG YWILLTALFV CQPNYTATKS RVYQ RIAGTV LGVIVGSLVP 

451 YFTPSVETKL WIVIAGTTLF FMTRTYKYSF STFFITIQAL TSLSLAGLDV 

501 YAAMPVRIID TI I GAS LAW A AVSYLWPDWK YLTLERTAAL AVCSSGTYLQ 

551 KIAERLKTGE TGDDIEYRIT RRRAHEHTAA LSSTLSDMSS EPAKFADSLQ 

601 PGFTLLKTGY ALTGYISALG AYRSEMHEEC SPDFTAQFHL AAEHTAHIFQ 

651 HLPDMGPDDF QTALDTLRGE LGTLRTRSSG TQSHILLQQL QLIARQLEPY 

701 YRAYRQIPHR QPQNAA* 

ORF19ng-l and ORF19-1 show 95.5% identity in 716 aa overlap: 



10 20 30 40 50 60 

orf 19-1 . pep MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ( I I I I I I I I I I 
orfl9ng-l MKTPLLKPLL IT SLPVFASVFTAASIVWQLGEPKLAMPFVLGI I AGGLVDLDNRLTGRLK 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 19-1. pep NIITTVALFTLSSLTAQSTLGTGLPFILAMTLMTFGFTILGAVGLKYRTFAFGALAVATY 
I I I : I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 
orfl9ng-l NIIATVALFTLSSLTAQSTLGTGLPFILAMTLMTFGFTILGAVGLKYRTFAFGALAVATY 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 19-1 . pep TTLTYTPETYWLTNPFMILCGTVLYSTAILLFQIVLPHRPVQESVANAYDALGGYLEAKA 
I I I I I II I I I I I I I I I I I I I I I I I I I I I I : I I I I : I I I I I I II I I I I II : I I I II I I I II 
orfl9ng-l TTLTYTPETYWLTNPFMILCGTVLYSTAIILFQIILPHRPVQESVANAYEALGGYLEAKA 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 19-1 . pep DFFDPDEAAWIGNRHIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 
I | | | | | I I I I I I I I I I I I I I I I I I I II II I I I I I II I I I I I I I I I I I I I II I I I II I I I I 
orfl9ng-l DFFDPDEAAWIGNRHIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 

190 200 210 220 230 240 



orf 19-1 .pep 



250 260 270 280 290 300 

DIHERISSAHVDYQEMSEKFKNTDIIFRIHRLLEMQGQACRNTAQALRASKDYVYSKRLG 
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10 



15 



20 



25 



30 



35 



40 



45 



50 



I | | | I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I : I : : I I I I I I I I I I 
orfl9ng-l DIHERISSAHVDYQEMSEKFKNTDIIFRIRRLLEMQGQACRNTAQAIRSGKDYVYSKRLG 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 19-1. pep RAIEGCRQSLRLLSDSNDSPDIRHLRRLLDNLGSVDQQFRQLQHNGLQAENDRMGDTRIA 
I I I I I II M It M I I : I I I I II I M I I i II I I 1 I I I I I I II : I : llllllllllll 
orfl9ng-l RAIEGCRQSLRLLSDGNDSPDIRHLSRLLDNLGSVDQQFRQLRHSDSPAENDRMGDTRIA 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 19-1. pep ALETSSLKNTWQAIRPQLNLESGVFRHAVRLSLWAAACTIVEALNLNLGYWILLTALFV 
I I I I : I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl9ng-l ALETGSFKNTWQAIRPQLNLESCVFRHAVRLSLWAAACTIVEALNLNLGYWILLTALFV 

370 380 390 400 410 420 

430 440 450 460 470 480 

or f 19-1. pep CQPNYTATKSRVRQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIASTTLFFMTRTYKYSF 
llllllllllll I M I I M I I I I M M I t I I I M I I I i I I I I I I : I I I I I I II I I M I I 
orfl9ng-l CQPNYTATKSRVYQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIAGTTLFFMTRTYKYSF 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf 19-1. pep ST FFIT IQALTSLSLAGLDVYAAMPVRI I DT 1 1 GAS LAWAAVS YLWPDWKYLTLERTAAL 
I || I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I 
orfl9ng-l ST FFIT IQALTSLSLAGLDVYAAMPVRI I DT I IGAS LAWAAVS YLWPDWKYLTLERTAAL 

490 500 510 520 530 540 

550 560 570 580 590 600 

orf 19-1. pep AVCSNGAYLEKITERLKSGETGDDVEYRATRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 
I I I I : I : I I : I I : I I I I : I I I I I I : I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I 
orfl9ng-l AVCSSGTYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 

550 560 570 580 590 600 

610 620 630 640 650 660 

orf 19-1. pep PGFTLLKTGYALTGYISALGAYRSEMHEECSPDFTAQFHLAAEHTAHIFQHLPETEPDDF 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I II I I I : I I I I 
orfl9ng-l PGFTLLKTGYALTGYISALGAYRSEMHEECSPDFTAQFHLAAEHTAHIFQHLPDMGPDDF 

610 620 630 640 650 660 

670 680 690 700 710 

orf 19-1. pep QTALDTLRGELDTLRTHSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 
1 I I 1 I I I 1 I 1 t I I I I : M I I M I I I I I I M II I I M I M I I I I I I I I M I I I M I I 
orfl9ng-l QTALDTLRGELGTLRTRSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 

670 680 690 700 710 

In addition, ORF19ng-l shows significant homology to a hypothetical gonococcal protein 
previously entered in the databases: 

sp|O33369|YOR2_NEIG0 HYPOTHETICAL 45.5 KD PROTEIN (ORF2) gnl | PID | ell54438 
(AJ002423) hypothetical protein [Neisseria gonorrh] Length ~ 417 

Score « 1512 (705.6 bits), Expect = 5.3e-203, P = 5.3e-203 

Identities = 301/326 (92%), Positives = 306/326 (93%) 



55 



60 



65 



Query: 


307 


Sbjct : 


1 


Query: 


367 


Sbjct: 


61 


Query: 


427 


Sbjct: 


121 


Query: 


487 


Sbjct: 


181 



RQSLRLLSDGNDS DIRHLSRLLDNLGSVDQQFRQLRHSDSPAENDRMGDTRIAALETGS 



FKNTWQAI RPQLNLE S VFRHAVRLSLWAAACT IVEALNLNLGYWI LLT LFVCQPNYT 



ATKSRVYQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIAGTTLFFMTRTYKYSFSTFFIT 



IQALTSLSLAGLDVYAAMPVRI I DTI I GAS LAW AAV SYLWPDWKYLTLERTAALAVCSSG 
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Query: 547 TYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFADSLQPGFTLL 606 

TYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFAD+ P 
Sbjct: 241 TYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFADTCNPALPCS 300 

Query: 607 KTGYALTGYISALGAYRSEMHEECSP 632 

K ALTGYISALG ++ + +P 
Sbjct: 301 KPATALTGYISALGHTAAKCTKNAAP 326 

Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein (the first of which is also seen in the meningococcal protein), and on homology 
with the YHFK protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 14 

The following DNA sequence, believed to be complete, was identified in N.meningitidis <SEQ ID 
113>: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



ATGAATATGC 
GCGCGTTTTG 
CGGGTATGGC 
CTTCGCCGCG 
TTTGGCGGAA 
TCCGCCATGT 
CTGGGCATAC 
TTGCCCAAGA 
ACGTTTCCTT 
ACTCAATTCT 
TGAACGTGTC 
CCGCCCGTTA 
ACTCGrmTTC 
CCAAACtGAG 
GCGCCTGCgA 
CACGATTTTc 
ACGCCGACCG 
GGTACGATTT 
GGaACAGTTT 
TGACGCTGCC 
GCGACGCTGT 
GCAACACGCG 
TTAAAGTGTT 
GTCAAAATCG 
CTTTAyCGGC 
GGCGCGTGTA 
TATTTACCAA 
GcTCTCGCTC 



TGGGAGCTTT 
GGATTTGTGC 
GACGGATGCG 
TGTTTGCGGA 
TACAAGGAAA 
GGCGGGGATG 
TTGCCGCGCC 
TGCCGACAAA 
ATATATTATT 
TATCATAAGT 
GTTTATCGTA 
CCGCGCyGGC 
CAACTGCCCT 
TTTCAAAGAT 
TTTTgGGCGT 
GCGTCTTATC 
CATGATGGAG 
TGCTGCCGAC 
TCCGCCCTGC 
GGCGgcGGTC 
TTATGTACCG 
CTGATTGCCT 
GGCACCCGGC 
CCATCTTCAC 
CCACTrrAAC 
TCAATGCCGG 
CCTGG.CAAG 
GCCGTGA 



GGCAAAAGTC 
GCGATACGGT 
TTTTTTGTCG 
GGGGGCGTTT 
CGCGTTCAAA 
CTGTCGTTTG 
TTGGGTGATT 
TTTCAGCTCT 
GATTTCCCTG 
TCGGCATTCC 
TTCGCGCTGT 
GTGGGCGGTC 
GGCTGGCGAA 
GCGGCGGTCA 
GAgCGTGGCG 
TGCAATCGGG 
CTGCCCAGCG 
TTTGTCCAAA 
TCGACTGGGG 
GGACTGGCGG 
CGwATTTACG 
ATTCTTTCGG 
TTCTATGCGC 
GCTCATCTGC 
rCajTCGGAC 
ATTGTTGTTT 
GGTTGGGCAG 



GGCAGCCTGA 
CATTGCGCGG 
CGTTCAAACT 
GCCCAAGCGT 
AGAGGCGG . C 
TACTGGTTAT 
TATGTTTCCG 
CCATCGATTT 
TCTTCATTTG 
GGCGTTTACG 
TTTTCGTGCC 
TTTGTCGGCG 
ACTGGGCTTT 
ACCGCGTGAT 
CAGGTTTCTT 
CAGCGTTTCA 
GCGTGCTGGG 
CACTCGGCAA 
TTTGCGCCTG 
TGTTGTCGTT 
CTGTTTGACG 
TTTAATCGGC 
GGCAAAACAT 
mCGCAGTTGA 
TTTCGCTTGC 
TACCTGTTGC 
CGTTCTT . AG 



CGATGGTGTC 
GCATTCGGCG 
GCCCAACCTG 
TTGTGCCGAT 
GAAGCCTTTA 
CGTTACCGCG 
CACCCgAGTT 
GCTGCGGATT 
TCGGCTCGGT 
CCAC . GTTTC 
GTATTTCGAT 
GCATTTTGCA 
TTGAAACTGC 
GAAACAGATG 
TGGTGATCAA 
TGGATGTATT 
GGCGGCACTC 
ACCaAGATAC 
TGCATGCtgc 
cCCgCtGGTG 
CGCAGATGAC 
TTAATCATGA 
CAAwAraGCCC 
TGAACCTTGs 
CATCGGTCTG 
GCAGACACGG 
CAAAAATGCT 



This corresponds to the amino acid sequence <SEQ ID 1 14; 0RF2O: 



1 MNMLGALAKV GSLTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAX EAFIRHVAGM LSFVLVIVTA 

101 LGILAAPWVI YVSAPSFAQD ADKFQLSIDL LRITFPYILL ISLSSFVGSV 

151 LNSYHKFGIP AFTPXFLNVS FIVFALFFVP YFDPPVTAXA WAVFVGGILQ 

201 LXFQLPWLAK LGFLKLPKLS FKDAAVNRVM KQMAPAILGV SVAQVSLVIN 

251 TIFASYLQSG SVSWMYYADR MMELPSGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG LRLCMLLTLP AAVGLAVLSF PLVATLFMYR XFTLFDAQMT 

351 QHALIAYSFG LIGLIMIKVL APGFYARQNI XXPVKIAIFT LICXQLMNLX 

401 FXGPLXXIGL SLAIGLGACI NAGLLFYLLR RHGIYQPXQG LGSVLXQKCC 

451 SRSP* 

These sequences were elaborated, and the complete DNA sequence <SEQ ID 1 15> is: 



1 ATGAATATGC TGGGAGCTTT GGCAAAAGTC GGCAGCCTGA CGATGGTGTC 
51 GCGCGTTTTG GGATTTGTGC GCGATACGGT CATTGCGCGG GCATTCGGCG 
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101 CGGGTATGGC GACGGATGCG TTTTTTGTCG CGTTCAAACT GCCCAACCTG 

151 CTTCGCCGCG TGTTTGCGGA GGGGGCGTTT GCCCAAGCGT TTGTGCCGAT 

201 TTTGGCGGAA TACAAGGAAA CGCGTTCAAA AGAGGCGGCG GAGGCTTTTA 

251 TCCGCCATGT GGCGGGGATG CTGTCGTTTG TACTGGTTAT CGTTACCGCG 

301 CTGGGCATAC TTGCCGCGCC TTGGGTGATT TATGTTTCCG CACCCGGTTT 

351 TGCCCAAGAT GCCGACAAAT TTCAGCTCTC CATCGATTTG CTGCGGATTA 

4 01 CGTTTCCTTA TATATTATTG ATTTCCCTGT CTTCATTTGT CGGCTCGGTA 

451 CTCAATTCTT ATCATAAGTT CGGCATTCCG GCGTTTACGC CCACGTTTCT 

501 GAACGTGTCG TTTATCGTAT TCGCGCTGTT TTTCGTGCCG TATTTCGATC 

551 CGCCCGTTAC CGCGCTGGCG TGGGCGGTCT TTGTCGGCGG CATTTTGCAA 

601 CTCGGCTTCC AACTGCCCTG GCTGGCGAAA CTGGGCTTTT TGAAACTGCC 

651 CAAACTGAGT TTCAAAGATG CGGCGGTCAA CCGCGTGATG AAACAGATGG 

701 CGCCTGCGAT TTTGGGCGTG AGCGTGGCGC AGGTTTCTTT GGTGATCAAC 

751 ACGATTTTCG CGTCTTATCT GCAATCGGGC AGCGTTTCAT GGATGTATTA 

801 CGCCGACCGC ATGATGGAGC TGCCCAGCGG CGTGCTGGGG GCGGCACTCG 

851 GTACGATTTT GCTGCCGACT TTGTCCAAAC ACTCGGCAAA CCAAGATACG 

901 GAACAGTTTT CCGCCCTGCT CGACTGGGGT TTGCGCCTGT GCATGCTGCT 

951 GACGCTGCCG GCGGCGGTCG GACTGGCGGT GTTGTCGTTC CCGCTGGTGG 

1001 CGACGCTGTT TATGTACCGC GAATTTACGC TGTTTGACGC GCAGATGACG 

1051 CAACACGCGC TGATTGCCTA TTCTTTCGGT TTAATCGGCT TAATCATGAT 

1101 TAAAGTGTTG GCACCCGGCT TCTATGCGCG GCAAAACATC AAAACGCCCG 

1151 TCAAAATCGC CATCTTCACG CTCATCTGCA CGCAGTTGAT GAACCTTGCC 

1201 TTTATCGGCC CACTGAAACA CGTCGGACTT TCGCTTGCCA TCGGTCTGGG 

1251 CGCGTGTATC AATGCCGGAT TGTTGTTTTA CCTGTTGCGC AGACACGGTA 

1301 TTTACCAACC TGGCAAGGGT TGGGCAGCGT TCTTAGCAAA AATGCTGCTC 

1351 TCGCTCGCCG TGATGTGCGG CGGACTGTGG GCAGCGCAGG CTTACCTGCC 

1401 GTTTGAATGG GCGCACGCCG GCGGAATGCG GAAAGCGGGG CAGCTCTGCA 

14 51 TCCTGATTGC CGTCGGCGGC GGACTGTATT TCGCATCACT GGCGGCTTTG 

1501 GGCTTCCGTC CGCGCCATTT CAAACGCGTG GAAAACTGA 

This corresponds to the amino acid sequence <SEQ ID 1 16; ORF20-1>: 



1 MNMLGALAKV GSLTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAA EAFIRHVAG M LSFVLVIVTA 

101 LGILAAPWVI YVSAPGFAQD ADKFQLSIDL LRIT FPYILL ISLSSFVGSV 

151 LNSYHKFGIP AFTPT FLNVS FIVFALFFVP YF DPP VTALA WAVFVGGILQ 

201 LGFQLPWLAK LGFLKLPKLS FKDAAVNRVM KQ MAPAILGV SVAQVSLVI N 

251 TIFASYLQSG SVSWMYYADR MMELPSGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG LR LCMLLTLP AAVGLAVLS F PLVATLFMYR EFTLFDAQMT 

351 QH ALIAYSFG LIGLIMIKVL APGFYARQNI KTPVK IAIFT LICTQLMNLA 

401 FIGPLKHVGL S LAIGLGACI NAGLLFYL LR RHGIYQPGKG WA AFLAKMLL 

451 SLAVMCGGL W AAQAYLPFEW AHAGGMRKAG Q LCILIAVGG GLYFASLAA L 

501 GFRPRHFKRV EN* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with the MviN virulence factor of S. tvphimurium (accession number P37169) 
ORF20 and MviN proteins show 63% aa identity in 440aa overlap: 

Orf20 1 MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 60 

MN+L +LA V S+TM SRVLGF RD ++AR FGAGMATDAFFVAFKLPNLLRR+FAEGAF 
MviN 14 MNLLKSLAAVSSMTMFSRVLGFARDAIVARIFGAGMATDAFFVAFKLPNLLRRIFAEGAF 73 



Orf20 61 AQAFVPILAE YKETRSKEAXEAFIRHVAGMLSFVLVIVTALGILAAPWVIYVSAPSFAQD 120 

+QAFVPILAEYK + +EA F+ +V+G+L+ L +VT G+LAAPWVI V+AP FA 
MviN 74 SQAFVPILAEYKSKQGEEATRIFVAYVSGLLTLALAWTVAGMLAAPWVIMVTAPGFADT 133 

Orf20 121 ADKFQLSIDLLRITFPYILLISLSSFVGSVLNSYHKFGIPAFTPXFLNVSFIVFALFFVP 180 

ADKF L+ LLRITFPYILLISL+S VG++LN++++F IPAF P FLN+S I FALF P 
MviN 134 ADKFALTTQLLRITFPYILLISLASLVGAILNTWNRFSIPAFAPTFLNISMIGFALFAAP 193 

Orf20 181 YFDPPVTAXAWAVFVGGILQLXFQLPWLAKLGFLKLPKLSFKDAAVNRVMKQMAPAILGV 240 

YF+PPV A AWAV VGG+LQL +QLP+L K+G L LP+++F+D RV+KQM PAILGV 
MviN 194 .YFNPPVLALAWAVTVGGVLQLVYQLPYLKKIGMLVLPRINFRDTGAMRWKQMG PAILGV 253 

Orf20 241 SVAQVSLVINTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 300 

SV+Q+SL+INTIFAS+L SGSVSWMYYADR+ME PSGVLG ALGTILLP+LSK A+ + 
MviN 254 SVSQISLIINTIFASFLASGSVSWMYYADRLMEFPSGVLGVALGTILLPSLSKSFASGNH 313 
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Orf20 301 EQFSALLDWGLRLCMLLTLPAAVGLAVLSFPLVATLFMYRXFTLFDAQMTQHALIAYSFG 360 

+++ L+DWGLRLC LL LP+AV h +L+ PL +LF Y FT FDA MTQ ALIAYS G 
MviN 314 DEYCRLMDWGLRLCFLLALPSAVALGILAKPLTVSLFQYGKFTAFDAAMTQRALIAYSVG 373 

Orf20 361 LIGLIMIKVLAPGFYARQNIXXPVKIAIFTLICXQLMNLXFXXXXXXXXXXXXXXXXXCI 420 

LIGLI++KVLAPGFY+RQ+I PVKIAI TLI QLMNL F C+ 
MviN 374 LIGLIWKVLAPGFYSRQDIKTPVKIAIVTLIMTQLMNLAFIGPLKHAGLSLSIGLAACL 433 



Orf20 421 NAGLLFYLLRRHGIYQPXQG 440 

NA LL++ LR+ 1+ P G 
MviN 434 N AS LLYWQLRKQN I FT PQPG 453 



Homology with a predicted ORF from N.meninzitidis (strain A) 

ORF20 shows 93.5% identity over a 447aa overlap with an ORF (ORF20a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 20 . pep MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 

I I I I II I : I I I I I I I I ! II I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I 1 II I I I 
orf 20a MNMLGALVKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 

10 20 30 40 50 60 



70 80 90 100 110 120 

or f 20 . pep AOAFVPILAEYKETRSKEAXEAFIRHVAG MLSFVLVIVTALGILAA PWVIYVSAPSFAQD 
I I I I I I I I I I I I I I I I I I I : I II I I I II I I I I II I I I I I I I I I II I I I I I II I II : I I : I 
orf 20a AQAFVPILAEYKETRSKEATEAFIRHVAG MLSFVLVIVTALGILAA PWVIYVSAPGFAKD 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 20 .pep ADKFQLSIDLLRIT FPYILLISLSSFVGSVL NSYHKFGIPAFTP XFLNVSFIVFALFFVP 
I I I I I I I I I I I I I I I I I I M I I II I I I I I I I I I I I i I : I II I I I : I I I I I I I I I I I I II I 
orf 20a ADKFQLSIDLLRIT FPYILLISLSSFVGSVL NSYHKFSIPAFTPT FLNVSFIVFALFFVP 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 20 . pep YFDPP VTAXAWAVFVGGILQLX FQLPWLAKLGFLKLPKLSFKDAAVNRVMK QMAPAILGV 
I I I I I I I I I I I I II I I I II I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I 
o r f 2 0 a YFD P P VTALAWAV FVGG I LQLG FQL PWLAKLG FLKLPKLS FKDAAVNRVMK QMAPAILGV 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 20. pep SVAQVSLVI NTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 
I I I I : I I I I I I I I II II I I I I I I I I I I I II I I II I : I I M I I I I I 1 I I I I I I I I I I II I I 
orf 20a SVAQISLVI NTIFASYLQSGSVSWMYYADRMMELPGGVLGAALGTILLPTLSKHSANQDT 

250 260 270 280 290 300 



310 320 330 340 350 360 

orf 20 .pep EQFSALLDWGLR LCMLLTLPAAVGLAVLS FPLVATLFMYRXFTLFPAQMTQH ALIAYSFG 
I I I I I I I I I I I I I I I M I I I I I I : I I I I I I I I I I II I I I I I I II I II I I I I I I I I I I I 
orf 20a EQFSALLDWGLR XCMLLTLPAAVGMAVLS FPLVATLFMYREFTLFDAQMTQH ALIAYSFG 

310 320 330 340 350 360 



370 380 390 400 410 420 

orf 20 . pep LIGLIMIKVL APGFYARQNIXXPVK IAIFTLICXQLMNLXFX GPLXXIGLS LAIGLGACI 
I I I I I I I I I I I I I I I I II I I : I I I I I I I I I I I : I I I I I I III : II I I I I I I I I I I 
orf 20a LIGLIMIKVLA PGFYARQNIKTPVK IAIFTLICTQLMNLAFI GPLKHVGLS LAIGLGACI 

370 380 390 400 410 420 



430 440 450 

orf 20 .pep NAGLLFYL LRRHGIYQPXQGLGSVLXQKCCSRSPX 

I I I I M I I I II I I I I I I : I :: I : 
orf 20a NAGLLFYL LRRHGIYQPGKGWA AFLAKMLLSLAVMGGGL YAAQIWLPFDWAHAGGMQKAA 

430 440 450 460 470 480 

The complete length ORF20a nucleotide sequence <SEQ ID 1 17> is: 

1 ATGAATATGC TGGGAGCTTT GGTAAAAGTC GGCAGCCTGA CGATGGTGTC 
51 GCGCGTTTTG GGATTTGTGC GCGATACGGT CATTGCGCGC GCATTCGGCG 
101 CAGGCATGGC GACGGATGCG TTCTTTGTCG CGTTCAAACT GCCCAACCTG 
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151 CTTCGCCGCG TGTTTGCGGA GGGGGCGTTT GCCCAAGCGT TTGTGCCGAT 

201 TTTGGCGGAA TATAAGGAAA CGCGTTCTAA AGAGGCGACG GAGGCTTTTA 

251 TCCGCCATGT GGCGGGGATG CTGTCGTTTG TACTGGTCAT CGTTACCGCG 

301 CTGGGCATAC TTGCCGCGCC TTGGGTGATT TATGTTTCCG CACCCGGTTT 

351 TGCCAAAGAT GCCGACAAAT TTCAGCTCTC TATCGATTTG CTGCGGATTA 

4 01 CGTTTCCTTA TATCTTATTG ATTTCACTTT CCTCTTTTGT CGGCTCGGTA 

451 CTCAATTCCT ATCATAAATT CAGCATTCCT GCGTTTACGC CCACGTTCCT 

501 GAACGTGTCG TTTATCGTAT TCGCGCTGTT TTTCGTGCCG TATTTCGATC 

551 CTCCCGTTAC CGCGCTGGCT TGGGCGGTTT TTGTCGGCGG CATTTTGCAA 

601 CTCGGCTTCC AACTGCCCTG GCTGGCGAAA CTGGGTTTTT TGAAACTGCC 

651 CAAACTGAGT TTCAAAGATG CGGCGGTCAA CCGCGTGATG AAACAGATGG 

701 CGCCTGCGAT TTTGGGCGTG AGCGTGGCGC AGATTTCTTT GGTGATCAAC 

751 ACGATTTTCG CGTCTTATCT GCAATCGGGC AGCGTTTCAT GGATGTATTA 

801 CGCCGACCGC ATGATGGAAC TGCCCGGCGG CGTGCTGGGG GCGGCACTCG 

851 GTACGATTTT GCTGCCGACT TTGTCCAAAC ACTCGGCAAA CCAAGATACG 

901 GAACAGTTTT CCGCCCTGCT CGACTGGGGT TTGCGCNTGT GCATGCTGCT 

951 GACGCTGCCG GCGGCGGTCG GAATGGCGGT GTTGTCGTTC CCGCTGGTGG 

1001 CAACCTTGTT TATGTACCGA GAATTCACGC TGTTTGACGC GCAGATGACG 

1051 CAACACGCGC TGATTGCCTA TTCTTTCGGT TTAATCGGTT TAATCATGAT 

1101 TAAAGTGTTG GCGCCCGGCT TTTATGCGCG GCAAAACATC AAAACGCCCG 

1151 TCAAAATCGC CATCTTCACG CTCATTTGCA CGCAGTTGAT GAACCTTGCC 

1201 TTTATCGGCC CACTGAAACA CGTCGGACTT TCGCTTGCCA TCGGTCTGGG 

1251 CGCGTGTATC AATGCCGGAT TGTTGTTTTA CCTGTTGCGC AGACACGGTA 

1301 TTTACCAACC TGGCAAGGGT TGGGCAGCGT TCTTGGCAAA AATGCTGCTC 

1351 TCGCTCGCCG TGATGGGAGG CGGCCTGTAT GCCGCCCAAA TCTGGCTGCC 

14 01 GTTCGACTGG GCACACGCCG GCGGAATGCA AAAGGCCGCC CGGCTCTTCA 

14 51 TCCTGATTGC CGTCGGCGGC GGACTGTATT TCGCATCACT GGCGGCTTTG 

1501 GGCTTCCGTC CGCGCCATTT CAAACGCGTG GAAAGCTGA 

This encodes a protein having amino acid sequence <SEQ ID 1 1 8>: 

1 MNMLGALVKV GSLTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAT EAFIRHVAG M LSFVLVIVTA 

101 LGILAAPWVI YVSAPGFAKD ADKFQLSIDL LRIT FPYILL ISLSSFVGSV 

151 LNSYHKFSIP AFTPT FLNVS FIVFALFFVP YF DPP VTALA WAVFVGGILQ 

201 LGFQLPWLAK LGFLKLPKLS FKDAAVNRVM KQ MAPAILGV SVAQISLVI N 

251 TIFASYLQSG SVSWMYYADR MMELPGGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG LR XCMLLTLP AAVGMAVLS F PLVATLFMYR EFTLFDAQMT 

351 QH ALIAYSFG LIGLIMIKVL APGFYARQNI KTPVK IAIFT LICTQLMNLA 

401 FIGPLKHVGL S LAIGLGACI NAGLLFYL LR RHGIYQPGKG W AAFLAKMLL 

451 SLAVMGGGL Y AAQIWLPFDW AHAGGMQKAA R LFILIAVGG GLYFASLAA L 

501 GFRPRHFKRV ES* 

ORF20a and ORF20-1 show 96.5% identity in 512 aa overlap: 

10 20 30 40 50 60 

MNMLGALVKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 
I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I 
MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 
10 20 30 40 50 60 
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AQAFVPILAEYKETRSKEATEAFIRHVAGMLSFVLVIVTALGILAAPWVIYVSAPGFAKD 
I M I Mil I II I II I 1111:11 III 111 Mill Ml I II Mill II I! I I II MM 11:1 
AQAFVPILAEYKETRSKEAAEAFIRHVAGMLSFVLVIVTALGILAAPWVIYVSAPGFAQD 
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130 140 150 160 170 180 

ADKFQLS I DLLRITFPYILLISLSSFVGSVLNSYHKFS I PAFTPTFLNVS FIVFALFFVP 

I M I M I M M M M M I II M M M M M M I I M I : I M II I M M M I M M M M I 
ADKFQLS I DLLRITFPYILLISLSSFVGSVLNSYHKFGI PAFTPTFLNVS FIVFALFFVP 
130 140 150 160 170 180 

190 200 210 220 230 240 

YFDPPVTALAWAVFVGGI LQLGFQLPWLAKLGFLKLPKLS FKDAAVNRVMKQMAPAI LGV 

I I I I I II I I I I II I I I II I I I I I I I I I I I II I I I I I I I I I I M I I I I I I I I I II I I I II I 
YFDPPVTALAWAVFVGG I LQLGFQLPWLiAKLGFLKLPKLS FKDAAVNRVMKQMAPAI LGV 
190 200 210 220 230 240 

250 260 270 280 290 300 

SVAQISLVINTIFASYLQSGSVSWMYYADRMMELPGGVLGAALGTILLPTLSKHSANQDT 
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I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I It 
orf20-l SVAQVSLVINTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 20a . pep EQFSALLDWGLRXCMLLTLPAAVGMAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 
I I I I I I I I I I I I I I I I I I I I I I I : I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 20-1 EQFSALLDWGLRLCMLLTLPAAVGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 

310 320 330 340 ■ 350 360 

370 380 390 400 410 420 

orf 20a. pep LIGLIMIKVLAPGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHVGLSLAIGLGACI 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 1 I I I I I I I I 
orf 20-1 LIGLIMIKVLAPGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHVGLSLAIGLGACI 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 20a . pep NAGLLFYLLRRHGIYQPGKGWAAFLAKMLLSLAVMGGGLYAAQIWLPFDWAHAGGMQKAA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 111:111 : I I I : I I I I I I I: I i : 
orf 20-1 NAGLLFYLLRRHGIYQPGKGWAAFLAKMLLSLAVMCGGLWAAQAYLPFEWAHAGGMRKAG 

430 440 450 460 470 480 

490 500 510 

orf 20a . pep RLFILIAVGGGLYFASLAALGFRPRHFKRVESX 
: I I I I I I I I I I I I I II I I 1 I I I I I I I I II I : I 
o r f 2 0 - 1 QLC I L I AVGGGLY FAS LAALG FRPRH FKRVENX 

490 500 510 

Homology with a predicted ORF from ^gonorrhoeae 

ORF20 shows 92.1% identity over a 454aa overlap with a predicted ORF (ORF20ng) from N. 
gonorrhoeae: 
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MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 
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MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 
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AQAFVPILAEYKETRSKEATEAFIRHVAGMLSFVLIWTALGILAAPWVIYVSAPGFTKD 120 
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orf20ng NAGLLFFLFRKHGIYRPGQGLGQPSWRKCC^™* 

An ORF20ng nucleotide sequence <SEQ ID 1 1 9> was predicted to encode a protein having amino 
acid sequence <SEQ ID 120>: 
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1 MNMLGALAKV GSLTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAT EAFIRHVAGM LSFVLIWTA 

101 LGILAAPWVI YVSAPGFTKD ADKFQLSISL LRITFPYILL ISLSSFVGSI 

151 LNSYHKFGIP AFTPTFLNIS FIVFALFFVP YFDPPVTALA WAVFVGGILQ 

201 LGFQLPWLAK LGFLKLPKLN FKDAAVNRVM KQMAPAILGV SVAQISLVIN 

251 TIFASYLQSG SVS WMYYADR MMELPGGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG LRLCMLLTLP AAAGLAVLSF PLVATLFMYR EFTLFDAQMT 

351 QHALIAYSFG LIGLIMIKVL ASGFYARQNI KTPVKIAIFT LICTQLMNLA 

401 FIGPLKHAGL SLAIGLGACI NAGLLFFLFR KHGIYRPGQG LGQPSWRKCC 

451 SRSP* 

Further DNA sequence analysis revealed the following DNA sequence <SEQ ID 121>: 



1 ATGAATATGC TTGGAGCTTT GGCAAAAGTC GGCAGCCTGA CGATGGTGTC 

51 GCGCGTTTTG GGATTTGTGC GCGATACGGT CATTGCGCGG GCATTCGGCG 

101 CGGGTATGGC GACGGATGCG TTTTTTGTCG CGTTCAAACT GCCCAACCTG 

151 CTTCGCCGCG TGTTTGCGGA GGGGGCGTTT GCCCAAGCGT TTGTGCCGAT 

201 TTTGGCGGAA TATAAGGAAA CGCGTTCTAA AGAGGCGAcg gAGGCTTTTA 

251 TCCGCCACGt tgcgggAatg CTGTCGTTTG TGCTGATcgt cGttacCGCG 

301 CTGGGCATAC TTGCCGCgcc tTGGGTGATT TATGTTtccg CgcccGGCTT 

351 TACCAAAGAC GCGGACAAGT TCCAACTTTC CATCAGCCTG CTGCGGATTA 

401 CGTTTCCTTA TATATTATTG ATTTCTTTGT CTTCTTTTGT CGGCTCGATA 

451 CTCAATTCCT ACCATAAGTT CGGCATTCCC GCGTTTACGC CCACGTTTTT 

501 AAACATCTCT TTTATCGTAT TCGCACTGTT TTTCGTGCCG TATTTCGATC 

551 CGCCCGTTAC CGCGCTGGCG TGGGCGGTTT TTGTCGGCGG TATTTTGCAG 

601 CTCGGTTTCC AACTGCCGTG GCTGGCGAAA CTGGGCTTTT TGAAACTGCC 

651 CAAACTGAAT TTCAAAGATG CGGCGGTCAA CCGCGTCATG AAACAGATGG 

701 CGCCTGCGAT TTTGGGCGTG agcgTGGCGC AAATTTCTTT GgttATCAAC 

751 ACGATTTTCG CGTCTTATCT GCAATCGGGC AGCGTTTCAT GGATGTatta 

801 cgCCGACCGC ATGATGGAGc tgcgccGGGG CGTGCTGGGG GCTGCACTCG 

851 GTACAATTTT GCTGCCGACT TTGTCCAAAC ACTCGGCAAA CCAAGATACG 

901 GAACAGTTTT CCGCCCTGCT CGACTGGGGT TTGCGCCTGT GCATGCTGCT 

951 GACGCTGCCG GCGGCGGccg GACTGGCGGT ATTGTCGTTC CCGCTGGTGG 

1001 CGACGCTGTT TATGTACCGA GAATTCACGC TGTTTGACGC ACAAATGACG 

1051 CAACACGCGC TGATTGCCTA TTCTTTCGGT TTAATCGGTT TAATTATGAT 

1101 TAAAGTGTTG GCATCCGGCT TTTATGCGCG GCAAAACATC AAAACGCCCG 

1151 TCAAAATCGC CATCTTCACG CTCATCTGCA CGCAGTTGAT GAACCTCGCC 

1201 TTTATCGGTC CGTTGAAACA CGCCGGGCTT TCGCTCGCCA TCGGCCTGGG 

1251 CGCGTGCATC AACGCCGGAT TGTTGTTCTT CCTGTTGCGC AAACACGGTA 

1301 TTTACCGGCC cggcaggggt tgggcggcgt TCTTGGCGAA AATGCTGCTC 

1351 GCGCTCGCCG TGATGTGCGG CGGACTGTGG GCGGCGCAGG CTTGCCTGCC 

1401 GTTCGAATGG GCGCACGCCG GCGGAATGCG GAAAGCGGGG CAGCTCTGCA 

1451 TCCTGATTGC CGTCGGCGGC GGACTGTATT TCGCATCTCT GGCGGCTTTG 

1501 GGCTTCCGTC CGCGCCATTT CAAACGCGTG GAAAGCTGA 

This encodes the following amino acid sequence <SEQ ID 122; ORF20ng-l>: 



1 MNMLGALAKV GSLTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAT EAFIRHVA GM LSFVLIWTA 

101 LGILAA PWVI YVSAPGFTKD ADKFQLSISL LRIT FPYILL ISLSSFVGSI 

151 LNSYHKFGIP AFTPT FLNIS FIVFALFFVP YF DPP VTALA WAVFVGGILQ 

201 LGFQLPWLAK LGFLKLPKLN FKDAAVNRVM KQ MAPAILGV SVAQISLVIN 

251 TIFASYLQSG SVSWMYYADR MMELRRGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG LR LCMLLTLP AAAGLAVLS F PLVATLFMYR EFTLFDAQMT 

351 QH ALIAYSFG LIGLIMIKVL ASGFYARQNI KTPVK IAIFT LICTQLMNLA 

4 01 FIGPLKHAGL S LAIGLGACI NAGLLFFL LR KHGIYRPGRG W AAFLAKMLL 

451 ALAVMCGGL W AAQACLPFEW AHAGGMRKAG Q LCILIAVGG GLYFASLAA L 

501 GFRPRHFKRV ES* 

ORF20ng-l and ORF2(M show 95.7% identity in 512 aa overlap: 
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orf 20- 1 . pep NAGLLFYLLRRHGIYQPGKGWAAFLAKMLLSLAVMCGGLWAAQAYLPFEWAHAGGMRKAG 
I | I I I I : I I I : I I I I : I I : I I I I I I I I I I I : I I I I I I I I I I II I I I I I I I I I I I I I I I I 
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orf 20-1 . pep QLCI LI AVGGGLYFAS LAALGFRPRHFKRVENX 
I I I I II I I I I I I I I I I I I I I I I II I I I I I I I: 1 
orf20ng-l QLCILIAVGGGLYFASLAALGFRPRHFKRVESX 

490 500 510 

In addition, ORF20ng-l shows significant homology with a virulence factor of S.typhimurium: 

spl P37169 | MVIN__SALTY VIRULENCE FACTOR MVIN pir||S40271 mviN protein - Salmonella 
typhimurium gi 1 438252 (Z26133) mviB gene product [Salmonella typhimurium] 
gnl|PID|dl005521 (D25292) 0RF2 [Salmonella typhimurium] Length - 524 

Score = 1573 (750.1 bits), Expect = l.le-220, Sum P(2) = l.le-220 

Identities = 309/467 (66%), Positives = 368/467 (78%) 

MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 60 
MN+L +LA V S+TM SRVLGF RD ++AR FGAGMATDAFFVAFKLPNLLRR+FAEGAF 



+QAFVPILAEYK + +EAT F+ +V+G+L+ L WT G+LAAPWVI V+APGF 



ADKF L+ LLRITFPYILLISL+S VG+ILN++++F IPAF PTFLNIS I FALF P 



Query: 


1 


Sbjct: 


14 


Query: 


61 


Sbjct; 


74 


Query: 


121 


Sbjct: 


134 


Query: 


181 


Sbjct: 


194 


Query: 


241 


Sbjct: 


254 



YF+PPV ALAWAV VGG+LQL +QLP+L K+G L LP++NF+D 



RV+KQM PAILGV 



SV+QISL+INTIFAS+L SGSVSWMYYADR+ME GVLG ALGTILLP+LSK A+ + 
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Query: 301 EQFSALLDWGLRLCMLLTLPAAAGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 360 

+++ L+DWGLRLC LL LP+A L +L+ PL +LF Y +FT FDA MTQ ALIAYS G 
Sbjct: 314 DEYCRLMDWGLRLCFLLALPSAVALGILAKPLTVSLFQYGKFTAFDAAMTQRALIAYSVG 373 

Query: 361 LIGLIMIKVLASGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHAGLSLAIGLGACI 420 

LIGLI++KVLA GFY+RQ+IKTPVKIAI TLI TQLMNLAFIGPLKHAGLSL+IGL AC+ 
Sbjct: 374 LIGLIWKVLAPGFYSRQDIKTPVKIAIVTLIMTQLMNLAFIGPLKHAGLSLSIGLAACL 433 

Query: . 421 NAGLLFFLLRKHGIYRPGRGWXXXXXXXXXXXXVMCGGLWAAQACLP 4 67 

NA LL++ LRK 1+ P GW VM L+ +P 

Sbjct: 434 NASLLYWQLRKQN I FTPQPGWMWFLMRLI I SVLVMAAVLFGVLHIMP 4 80 

Score = 70 (33.4 bits), Expect - l.le-220, Sum P(2) - l.le-220 
Identities = 14/41 (34%) , Positives = 23/41 (56%) 

Query: 4 69 EWAHAGGMRKAGQLCILIAVGGGLYFASLAALGFRPRHFKR 509 

EW+ + + +L ++ G YFA+LA LGF+ + F R 
Sbjct: 481 EWSQGSMLWRLLRLMAWIAGIAAYFAALAVLGFKVKEFVR 521 

Based on this analysis, including the homology with a virulence factor from S.typhimurium, it is 
predicted that these proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 15 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 123>: 

1 atGATTAAAA TCAAAAAAGG TCTAAACCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGCCGTT tACGACGGCC CGGCCaTTAC CGAAGtCGCG TTGCTTGGCG 

101 AAGAATATGC CGGTATGCGC CCCTCGATGA AAGTCAAGGA AGGCGATGCC 

151 GTcAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGAATC CGGGCGTGGT 

201 GTTTACTGCG CCGGCTTCAG GcAAAATCGC CGCGATTCAC CGTGGCGAAA 

251 AGCGCGTACT TCAGTCAGTC GTGATTGCCG TTGAArGCAA CGACGAAATC 

301 GAGTTTGAAC GCTACGCACC TGAAGCGCTG GCAAACTTAA GCGGCGAAGA 

351 AGTGCGCCGC AACCTGATCC AATCCGGTTT GTGGACTGCG CTGCGCACCC 

401 GTCCGTTCAG CAAAATTCCT GCCGTCGATG CCGAGCCGTT CGCCATCTTC 

451 GTCAATGCGA tGGACACCAA TCCG. . 

This corresponds to the amino acid sequence <SEQ ID 124; ORF22>: 



1 MIKIKKGLNL PIAGRPEQAV YDGPAITEVA LLGEEYAGMR PSMKVKEGDA 
51 VKKGQVLFED KKNPGWFTA PASGKIAAIH RGEKRVLQSV VIAVEXNDEI 
101 EFERYAPEAL ANLSGEEVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 
151 VNAMDTNP. . 

Further work revealed the complete nucleotide sequence <SEQ ID 125>: 



1 ATGATTAAAA TCAAAAAAGG 

51 GCAAGCCGTT TACGACGGCC 

101 AAGAATATGC CGGTATGCGC 

151 GTCAAAAAAG GCCAAGTGCT 

201 GTTTACTGCG CCGGCTTCAG 

251 AGCGCGTACT TCAGTCAGTC 

301 GAGTTTGAAC GCTACGCACC 

351 AGTGCGCCGC AACCTGATCC 

401 GTCCGTTCAG CAAAATTCCT 

451 GTCAATGCGA TGGACACCAA 

501 CAAAGAAGCC GCCGAGGATT 

551 TGACCGAACG CAAAATCCAT 

601 TCTGAAAATG CTGCCAACAT 

651 TGCCGGTTTG AGTGGCACGC 

701 ATAAAACCGT GTGGACCATC 

751 TTGTTTGCAA CAGGCCGTCT 

801 TTCTCAAGTC AACAAACCGC 

851 TATCGCAAAT TACTGCGGGC 

901 TCCGGTTCGG TATTGAACGG 



TCTAAACCTG CCCATCGCGG GCAGACCGGA 
CGGCCATTAC CGAAGTCGCG TTGCTTGGCG 
CCCTCGATGA AAGTCAAGGA AGGCGATGCC 
GTTTGAAGAC AAAAAGAATC CGGGCGTGGT 
GCAAAATCGC CGCGATTCAC CGTGGCGAAA 
GTGATTGCCG TTGAAGGCAA CGACGAAATC 
TGAAGCGCTG GCAAACTTAA GCGGCGAAGA 
AATCCGGTTT GTGGACTGCG CTGCGCACCC 
GCCGTCGATG CCGAGCCGTT CGCCATCTTC 
TCCGCTGGCT GCCGACCCTA CGGTCATTAT 
TCAAACGCGG CCTGTTGGTA TTGAGCCGTT 
GTTTGTAAGG CAGCTGGCGC AGACGTGCCG 
CGAAACACAT GAATTCGGCG GCCCGCATCC 
ACATTCATTT CATCGAGCCG GTCGGCGCGA 
AATTATCAAG ATGTAATTAC CATTGGCCGT 
GAACACCGAG CGCGTGATTG CCCTAGGTGG 
GCCTCTTGCG TACCGTTTTG GGTGCGAAAG 
GAATTGGTTG ACACAGACAA CCGCGTGATT 
CGCGATTACA CAAGGCGCGC ACGATTATTT 
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951 GGGACGCTAC CACAATCAGA TTTCCGTTAT CGAAGAAGGC CGCAGCAAAG 

1001 AGCTGTTCGG CTGGGTTGCG CCGCAGCCGG ACAAATACTC CATCACGCGT 

1051 ACAACCCTCG GCCATTTCCT GAAAAACAAA CTCTTCAAGT TCAACACAGC 

1101 CGTCAACGGC GGCGACCGCG CCATGGTGCC GATTGGTACT TACGAGCGCG 

1151 TGATGCCCTT GGATATCCTG CCCACCCTGC TTTTGCGCGA TTTAATCGTC 

1201 GGCGATACCG ACAGCGCGCA GGCATTGGGT TGCTTGGAAT TGGACGAAGA 

1251 AGACCTCGCT TTGTGCAGCT TCGTCTGCCC GGGCAAATAC GAATACGGCC 

1301 CGCTGTTGCG CAAAGTGCTG GAAACCATTG AGAAGGAAGG CTGA 

This corresponds to the amino acid sequence <SEQ ID 126; ORF22-l>: 

1 MIKIKKGLNL PIAGRPEQAV YDGPAITEVA LLGEEYAGMR PSMKVKEGDA 

51 VKKGQVLFED KKNPGWFTA PASGKIAAIH RGEKRVLQSV VIAVEGNDEI 

101 EFERYAPEAL ANLSGEEVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 

151 VNAMDTNPLA AD PT VI IKEA AEDFKRGLLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVITIGR 

251 LFATGRLNTE RVIALGGSQV NKPRLLRTVL GAKVSQITAG ELVDTDNRVI 

301 SGSVLNGAIT QGAHDYLGRY HNQISVIEEG RSKELFGWVA PQPDKYSITR 

351 TTLGHFLKNK LFKFNTAVNG GDRAMVPIGT YERVMPLDIL PTLLLRDLIV 

401 GDTDSAQALG CLELDEEDLA LCSFVCPGKY EYGPLLRKVL ETIEKEG* 

Further work identified the corresponding gene in strain A of N. meningitidis <SEQ ID 127>: 

1 ATGATTAAAA TCAAAAAAGG TCTAAACCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGTCATT TATGACGGGC CCGTCATTAC CGAAGTCGCG TTGCTTGGCG 

101 AAGAATATGC CGGTATGCGC CCCTNGATGA AAGTCAAGGA AGGCGATGCC 

151 GTCAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGNATC CGGGCGTGGT 

201 GTTTACCGCG CCNGTTTCAG GCAAAATCGC CGCCATCCAT CGCGGCGAAA 

251 AGCGCGTACT TCAGTCGGTC GTGATTGCCG TTGAAGGCAA CGACGAAATC 

301 GAGTTCGAAC GCTACGCGCC CGAAGCGTTG GCAAACTTAA GCGGCGANGA 

351 ANTNNGNNGC AATCTGATCC AATCCGGTTT GTGGACTGCG CTGCGTANCC 

401 GTCCGTTCAG CAAAATCCCT GCCGTCGATG CCGAGCCGTT CGCCATCTTC 

451 GTCAATGCGA TGGACACCAA TCCGCTNGCG GCAGACCCTG TGGTTGTGAT 

501 CAAAGAAGCC GNCGANGATT TCAGACGANG TNTGCTGGTA TTGAGCCGTT 

551 TGACCGAGCG TAAAATCCAT GTGTGTAAGG CAGCTGGCGC AGACGTGCCG 

601 TCTGAAAATG CTGCCAACAT CGAAACACAT GAATTCGGCG GCCCGCATCC 

651 GGCCGGTTTG AGTGGCACGC ACATTCATTT CATTGAGCCG GTCGGTGCAA 

701 ACAAAACCGT TTGGACCATC AATTATCAAG ATGTAATTGC CATCGGACGT 

751 TTGTTTGCAA CAGGCCGTCT GAACACCGAG CGCGTGATTG CTTTGGGTGG 

801 TTCTCAAGTC AACAAACCAC GCCTCTTGCG TACCGTTTTG GGTGCGAAAG 

851 TATCGCAAAT TACTGCGGGC GAATTGGTTG ACGCAGACAA CCGCGTGATT 

901 TCCGGTTCGG TATTGAACGG CGCGATTACA CAAGGCGCGC ACGATTATTT 

951 GGGACGCTAC CACAATCAGA TTTCCGTTAT CGAAGAAGGC CGCAGCAAAG 

1001 AGCTGTTCGG CTGGGTTGCG CCGCAGCCGG ACAAATACTC CATCACGCGT 

1051 ACGACCCTCG GCCATTTCCT GAAAAACAAA CTCTTCAAGT TCACGACAGC 

1101 CGTCAACGGT GGCGACCGCG CCATGGTGCC GATTGGTACT TACGAGCGCG 

1151 TAATGCCGCT AGACATCCTG CCTACCCTGC TTTTGCGCGA TTTAATCGTC 

1201 GGCGATACCG ACAGCGCGCA AGCATTGGGT TGCTTGGAAT TGGACGAAGA 

1251 AGACCTCGCT TTGTGCAGCT TCGTCTGCCC GGGCAAATAC GAATANGGCC 

1301 CGCTGTTGCG TAAGGTGCTG GAAACCNTTG AGAAGGAAGG CTGA 

This encodes a protein having amino acid sequence <SEQ ID 128; ORF22a>: 

1 MIKIKKGLNL PIAGRPEQVI YDGPVITEVA LLGEEYAGMR PXMKVKEGDA 

51 VKKGQVLFED KKXPGWFTA PVSGKIAAIH RGEKRVLQSV VIAVEGNDEI 

101 EFERYAPEAL ANLSGXEXXX NLIQSGLWTA LRXRPFSKIP AVDAEPFAIF 

151 VNAMDTNPLA ADPVWIKEA XXDFRRXXLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVIAIGR 

251 LFATGRLNTE RVIALGGSQV NKPRLLRTVL GAKVSQITAG ELVDADNRVI 

301 SGSVLNGAIT QGAHDYLGRY HNQISVIEEG RSKELFGWVA PQPDKYSITR 

351 TTLGHFLKNK LFKFTTAVNG GDRAMVPIGT YERVMPLDIL PTLLLRDLIV 

401 GDTDSAQALG CLELDEEDLA LCSFVCPGKY EXGPLLRKVL ETXEKEG* 

The originally-identified partial strain B sequence (ORF22) shows 94.2% identity over a 158aa 
overlap with ORF22a: 

10 20 30 40 50 60 

orf 22 . pep MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 
MINI j!llllllllli::l!!|:|||liillllllllM I I I I I I I I I I I I I I I I I I 
orf 22a MIKIKKGLNLPIAGRPEQVIYDGPVITEVALLGEEYAGMRPXMKVKEGDAVKKGQVLFED 
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orf 20-1. pep ADKFQLSIDLLRITFPYILLISLSSFVGSVLNSYHKFGIPAFTPTFLNVSFIVFALFFVP 
I I I I I I I I : I I I I I I I I I I I I I I I I I I I I: I I I I I I I I I I I I I I I I I I : I I I I I I I I i I I 
orf20ng-l ADKFQLSISLLRITFPYILLISLSSFVGSILNSYHKFGIPAFTPTFLNISFIVFALFFVP 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 20-1 . pep Y FD P PVT ALAWAVFVGG I LQLG FQLPWLAKLG FLKLPKLS FKDAAVNRVMKQMAPAI LGV 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I 
orf20ng-l YFDP PVT ALAWAVFVGG I LQLG FQLPWLAKLGFLKLPKLN FKDAAVNRVMKQMAPAI LGV 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 20-1 . pep SVAQVSLVINT I FAS YLQSGS VSWMYYADRMMELPSGVLGAALGTI LLPTLSKHSANQDT 
I I I I : I I I I I I I I I 1 I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 
orf20ng-l S VAQ I S LV I NT I FAS YLQSGSVSWMYYADRMMELRRGVLGAALGT I LLPTLSKHSANQDT 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 20-1 . pep EQFSALLDWGLRLCMLLTLPAAVGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 
I I I I I I I I I I I I II I I I I I I I I : I I I I I I I I I I I I I 1 I 11 I I I I I I I I I I I I I I I I I I I I 
orf20ng-l EQFSALLDWGLRLCMLLTLPAAAGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 20-1. pep LIGLIMIKVLAPGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHVGLSLAIGLGACI 
I I I I I I I I I I I I M I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I : I II II II I I I I I 
orf20ng-l LIGLIMIKVLASGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHAGLSLAIGLGACI 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 20-1 . pep NAGLLFYLLRRHGIYQPGKGWAAFLAKMLLSLAVMCGGLWAAQAYLPFEWAHAGGMRKAG 
I | I I I I : I I I : I I I I : I I : I I I I I I I I I I I : I I I I I I I I I I I I I I I I 1 I I I I I I I II I I 
orf20ng-l NAGLLFFLLRKHGIYRPGRGWAAFLAKMLLALAVMCGGLWAAQACLPFEWAHAGGMRKAG 

430 440 450 460 470 480 
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orf 20-1 .pep QLCILIAVGGGLYFASLAALGFRPRHFKRVENX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I 
orf20ng-l QLCILIAVGGGLYFASLAALGFRPRHFKRVESX 

490 500 510 

In addition, ORF20ng-l shows significant homology with a virulence factor of S. typhimurium: 

sp | P3 7 1 6 9 I MVIN_S ALTY VIRULENCE FACTOR MVIN pir|}S40271 mviN protein - Salmonella 
typhimurium gi 1438252 (Z26133) mviB gene product [Salmonella typhimurium] 
gnl I PID | dl005521 (D25292) ORF2 [Salmonella typhimurium] Length - 524 

Score - 1573 (750.1 bits), Expect = l.le-220, Sum P(2) - l.le-220 

Identities = 309/467 (66%), Positives - 368/467 (78%) 

MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 60 
MN+L +LA V S+TM SRVLGF RD ++AR FGAGMATDAFFVAFKLPNLLRR+FAEGAF 



+QAFVPILAEYK + +EAT F+ +V+G+L+ L WT G+LAAPWVI V+APGF 



ADKF L+ LLRITFPYILLISL+S VG+ILN++++F IPAF PTFLNIS I FALF P 



Query: 


1 


Sbjct : 


14 


Query: 


61 


Sbjct: 


74 


Query: 


121 


Sbjct: 


134 


Query: 


181 


Sbjct: 


194 


Query: 


241 


Sbjct: 


254 



YF+PPV ALAWAV VGG+LQL +QLP+L K+G L LP++NF+D 



RV+KQM PAILGV 



SV+QISL+INTIFAS+L SGSVSWMYYADR+ME GVLG ALGTILLP+LSK A+ + 
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Query: 301 EQFSALLDWGLRLCMLLTLPAAAGLAVLSFPLVATLTOYREFTLFDAQMTQHALIAYSFG 360 

+++ L+DWGLRLC LL LP+A L +L+ PL +LF Y +FT FDA MTQ ALIAYS G 
Sbjct: 314 DEYCRLMDWGLRLCFLLALPSAVALGILAKPLTVSLFQYGKFTAFDAAMTQRALIAYSVG 373 

Query: 361 LIGLIMIKVLASGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHAGLSLAIGLGACI 420 

LIGLI++KVLA GFY+RQ+IKTPVKIAI TLI TQLMNLAFIGPLKHAGLSL+IGL AC+ 
Sbjct: 374 LIGLIWKVLAPGFYSRQDIKTPVKIAIVTLIMTQLMNLAFIGPLKHAGLSLSIGLAACL 433 

Query: 421 NAGLLFFLLRKHGIYRPGRGWXXXXXXXXXXXXVMCGGLWAAQACLP 4 67 

NA LL++ LRK 1+ P GW VM L+ +P 

Sbjct: 434 N AS LL YWQLRKQN I FT PQPGWMW FLMRLI I S VLVMAAVLFGVLH IMP 480 

Score = 70 (33.4 bits), Expect - l.le-220, Sum P(2) - l.le-220 
Identities - 14/41 (34%), Positives = 23/41 (56%) 

Query: 4 69 E W AHAGGMRKAGQLC I L I AVGGGLY FAS LAALG FR PRH FKR 509 

EW+ + + +L ++ G YFA+LA LGF+ + F R 
Sbjct: 481 EWSQGSMLWRLLRLMAWIAGIAAYFAALAVLGFKVKEFVR 521 

Based on this analysis, including the homology with a virulence factor from S.typhimurium, it is 
predicted that these proteins from N .meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 15 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 123>: 

1 atGATTAAAA TCAAAAAAGG TCTAAACCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGCCGTT tACGACGGCC CGGCCaTTAC CGAAGtCGCG TTGCTTGGCG 

101 AAGAATATGC CGGTATGCGC CCCTCGATGA AAGTCAAGGA AGGCGATGCC 

151 GTCAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGAATC CGGGCGTGGT 

201 GTTTACTGCG CCGGCTTCAG GcAAAATCGC CGCGATTCAC CGTGGCGAAA 

251 AGCGCGTACT TCAGTCAGTC GTGATTGCCG TTGAArGCAA CGACGAAATC 

301 GAGTTTGAAC GCTACGCACC TGAAGCGCTG GCAAACTTAA GCGGCGAAGA 

351 AGTGCGCCGC AACCTGATCC AATCCGGTTT GTGGACTGCG CTGCGCACCC 

401 GTCCGTTCAG CAAAATTCCT GCCGTCGATG CCGAGCCGTT CGCCATCTTC 

451 GTCAATGCGA tGGACACCAA TCCG. . 

This corresponds to the amino acid sequence <SEQ ED 124; ORF22>: 

1 MIKIKKGLNL PIAGRPEQAV YDGPAITEVA LLGEEYAGMR PSMKVKEGDA 

51 VKKGQVLFED KKNPGWFTA PASGKIAAIH RGEKRVLQSV VIAVEXNDEI 

101 EFERYAPEAL ANLSGEEVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 

151 VNAMDTNP. . 

Further work revealed the complete nucleotide sequence <SEQ ID 125>: 

1 ATGATTAAAA TCAAAAAAGG TCTAAACCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGCCGTT TACGACGGCC CGGCCATTAC CGAAGTCGCG TTGCTTGGCG 

101 AAGAATATGC CGGTATGCGC CCCTCGATGA AAGTCAAGGA AGGCGATGCC 

151 GTCAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGAATC CGGGCGTGGT 

201 GTTTACTGCG CCGGCTTCAG GCAAAATCGC CGCGATTCAC CGTGGCGAAA 

251 AGCGCGTACT TCAGTCAGTC GTGATTGCCG TTGAAGGCAA CGACGAAATC 

301 GAGTTTGAAC GCTACGCACC TGAAGCGCTG GCAAACTTAA GCGGCGAAGA 

351 AGTGCGCCGC AACCTGATCC AATCCGGTTT GTGGACTGCG CTGCGCACCC 

401 GTCCGTTCAG CAAAATTCCT GCCGTCGATG CCGAGCCGTT CGCCATCTTC 

451 GTCAATGCGA TGGACACCAA TCCGCTGGCT GCCGACCCTA CGGTCATTAT 

501 CAAAGAAGCC GCCGAGGATT TCAAACGCGG CCTGTTGGTA TTGAGCCGTT 

551 TGACCGAACG CAAAATCCAT GTTTGTAAGG CAGCTGGCGC AGACGTGCCG 

601 TCTGAAAATG CTGCCAACAT CGAAACACAT GAATTCGGCG GCCCGCATCC 

651 TGCCGGTTTG AGTGGCACGC ACATTCATTT CATCGAGCCG GTCGGCGCGA 

701 ATAAAACCGT GTGGACCATC AATTATCAAG ATGTAATTAC CATTGGCCGT 

751 TTGTTTGCAA CAGGCCGTCT GAACACCGAG CGCGTGATTG CCCTAGGTGG 

801 TTCTCAAGTC AACAAACCGC GCCTCTTGCG TACCGTTTTG GGTGCGAAAG 

851 TATCGCAAAT TACTGCGGGC GAATTGGTTG ACACAGACAA CCGCGTGATT 

901 TCCGGTTCGG TATTGAACGG CGCGATTACA CAAGGCGCGC ACGATTATTT 
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951 GGGACGCTAC CACAATCAGA TTTCCGTTAT CGAAGAAGGC CGCAGCAAAG 

1001 AGCTGTTCGG CTGGGTTGCG CCGCAGCCGG ACAAATACTC CATCACGCGT 

1051 ACAACCCTCG GCCATTTCCT GAAAAACAAA CTCTTCAAGT TCAACACAGC 

1101 CGTCAACGGC GGCGACCGCG CCATGGTGCC GATTGGTACT TACGAGCGCG 

1151 TGATGCCCTT GGATATCCTG CCCACCCTGC TTTTGCGCGA TTTAATCGTC 

1201 GGCGATACCG ACAGCGCGCA GGCATTGGGT TGCTTGGAAT TGGACGAAGA 

1251 AGACCTCGCT TTGTGCAGCT TCGTCTGCCC GGGCAAATAC GAATACGGCC 

1301 CGCTGTTGCG CAAAGTGCTG GAAACCATTG AGAAGGAAGG CTGA 

This corresponds to the amino acid sequence <SEQ ID 126; ORF22-l>: 

1 MIKIKKGLNL PIAGRPEQAV YDGPAITEVA LLGEEYAGMR PSMKVKEGDA 

51 VKKGQVLFED KKNPGWFTA PASGKIAAIH RGEKRVLQSV VIAVEGNDEI 

101 EFERYAPEAL ANLSGEEVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 

151 VNAMDTNPLA ADPTVIIKEA AEDFKRGLLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVITIGR 

251 LFATGRLNTE RVIALGGSQV NKPRLLRTVL GAKVSQITAG ELVDTDNRVI 

301 SGSVLNGAIT QGAHDYLGRY HNQISVIEEG RSKELFGWVA PQPDKYSITR 

351 TTLGHFLKNK LFKFNTAVNG GDRAMVPIGT YERVMPLDIL PTLLLRDLIV 

401 GDTDSAQALG CLELDEEDLA LCSFVCPGKY EYGPLLRKVL ETIEKEG* 

Further work identified the corresponding gene in strain A of N. meningitidis <SEQ ID 127>: 

1 ATGATTAAAA TCAAAAAAGG TCTAAACCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGTCATT TATGACGGGC CCGTCATTAC CGAAGTCGCG TTGCTTGGCG 

101 AAGAATATGC CGGTATGCGC CCCTNGATGA AAGTCAAGGA AGGCGATGCC 

151 GTCAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGNATC CGGGCGTGGT 

201 GTTTACCGCG CCNGTTTCAG GCAAAATCGC CGCCATCCAT CGCGGCGAAA 

251 AGCGCGTACT TCAGTCGGTC GTGATTGCCG TTGAAGGCAA CGACGAAATC 

301 GAGTTCGAAC GCTACGCGCC CGAAGCGTTG GCAAACTTAA GCGGCGANGA 

351 ANTNNGNNGC AATCTGATCC AATCCGGTTT GTGGACTGCG CTGCGTANCC 

401 GTCCGTTCAG CAAAATCCCT GCCGTCGATG CCGAGCCGTT CGCCATCTTC 

451 GTCAATGCGA TGGACACCAA TCCGCTNGCG GCAGACCCTG TGGTTGTGAT 

501 CAAAGAAGCC GNCGANGATT TCAGACGANG TNTGCTGGTA TTGAGCCGTT 

551 TGACCGAGCG TAAAATCCAT GTGTGTAAGG CAGCTGGCGC AGACGTGCCG 

601 TCTGAAAATG CTGCCAACAT CGAAACACAT GAATTCGGCG GCCCGCATCC 

651 GGCCGGTTTG AGTGGCACGC ACATTCATTT CATTGAGCCG GTCGGTGCAA 

701 ACAAAACCGT TTGGACCATC AATTATCAAG ATGTAATTGC CATCGGACGT 

751 TTGTTTGCAA CAGGCCGTCT GAACACCGAG CGCGTGATTG CTTTGGGTGG 

801 TTCTCAAGTC AACAAACCAC GCCTCTTGCG TACCGTTTTG GGTGCGAAAG 

851 TATCGCAAAT TACTGCGGGC GAATTGGTTG ACGCAGACAA CCGCGTGATT 

901 TCCGGTTCGG TATTGAACGG CGCGATTACA CAAGGCGCGC ACGATTATTT 

951 GGGACGCTAC CACAATCAGA TTTCCGTTAT CGAAGAAGGC CGCAGCAAAG 

1001 AGCTGTTCGG CTGGGTTGCG CCGCAGCCGG ACAAATACTC CATCACGCGT 

1051 ACGACCCTCG GCCATTTCCT GAAAAACAAA CTCTTCAAGT TCACGACAGC 

1101 CGTCAACGGT GGCGACCGCG CCATGGTGCC GATTGGTACT TACGAGCGCG 

1151 TAATGCCGCT AGACATCCTG CCTACCCTGC TTTTGCGCGA TTTAATCGTC 

1201 GGCGATACCG ACAGCGCGCA AGCATTGGGT TGCTTGGAAT TGGACGAAGA 

1251 AGACCTCGCT TTGTGCAGCT TCGTCTGCCC GGGCAAATAC GAATANGGCC 

1301 CGCTGTTGCG TAAGGTGCTG GAAACCNTTG AGAAGGAAGG CTGA 

This encodes a protein having amino acid sequence <SEQ ID 128; ORF22a>: 

1 MIKIKKGLNL PIAGRPEQVI YDGPVITEVA LLGEEYAGMR PXMKVKEGDA 

51 VKKGQVLFED KKXPGWFTA PVSGKIAAIH RGEKRVLQSV VIAVEGNDEI 

101 EFERYAPEAL ANLSGXEXXX NLIQSGLWTA LRXRPFSKIP AVDAEPFAIF 

151 VNAMDTNPLA ADPVWIKEA XXDFRRXXLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVIAIGR 

251 LFATGRLNTE RVIALGGSQV NKPRLLRTVL GAKVSQITAG ELVDADNRVI 

301 SGSVLNGAIT QGAHDYLGRY HNQISVIEEG RSKELFGWVA PQPDKYSITR 

351 TTLGHFLKNK LFKFTTAVNG GDRAMVPIGT YERVMPLDIL PTLLLRDLIV 

401 GDTDSAQALG CLELDEEDLA LCSFVCPGKY EXGPLLRKVL ETXEKEG* 

The originally-identified partial strain B sequence (ORF22) shows 94.2% identity over a 151 
overlap with ORF22a: 

10 20 30 40 50 60 

orf 22 . pep MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 
M M I I I I I I I M I I I I I:: I I I I: II I I I 1 I I I M I I I I 1 I I I I I I I I I I I I I I I I I I 
orf 22a MIKIKKGLNLPIAGRPEQVIYDGPVITEVALLGEEYAGMRPXMKVKEGDA VKKGQVLFED 
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70 80 90 100 110 120 

orf 22 . pep KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEXNDEIEFERYAPEALANLSGEEVRR 

II I I I I I I I I : I I I II I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
orf 22a KKXPGWFTAPVSGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGXEXXX 

70 80 90 100 110 120 

130 140 150 

orf 22 .pep NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNP 
II II I I I I I I I I : I II I I I I I I I I I I I I I II I I I I I I I 
orf 22a NLIQSGLWTALRXRPFSKIPAVDAEPFAIFVNAMDTNPLAADPVWIKEAXXDFRRXXLV 

130 140 150 160 170 180 

The complete strain B sequence (ORF22-1) and ORF22a show 94.9% identity in 447 aa overlap: 

10 20 30 40 50 60 

orf 22a . pep MIKIKKGLNLPIAGRPEQVIYDGPVITEVALLGEEYAGMRPXMKVKEGDAVKKGQVLFED 

I I I I I I I I I I I I I II I I I : : I I I I : I I I I I I I I I I I I I I II I I I I I I I I I I I M I I I I I 
orf 22-1 MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 22a . pep KKXPGWFTAPVSGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGXEXXX 

II I I II I I I I : II I I I I I I I ! I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I 
orf 22-1 KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGEEVRR 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 22a . pep NLIQSGLWTALRXRPFSKIPAVDAEPFAIFVNAMDTNPLAADPVWIKEAXXDFRRXXLV 

I I I I I I I II I I I : I I I I I I I I II I I I I II I I I II I I II I I I I I : I : I I I I I I : I II 
orf 22-1 NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNPLAADPTVIIKEAAEDFKRGLLV 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 22a. pep LSRLTERKIHVCKAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTVWTI 
I | | | | I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I II II I II I I I I I I I I I 
orf 22-1 LSRLTERKIHVCKAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTVWTI 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 22a . pep NYQDVIAIGRLFATGRLNTERVIALGGSQVNKPRLLRTVLGAKVSQITAGELVDADNRVI 

I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I : I I I I I 
orf 22-1 NYQDVITIGRLFATGRLNTERVIALGGSQVNKPRLLRTVLGAKVSQITAGELVDTDNRVI 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 22a . pep SGSVLNGAITQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFLKNK 
I I II I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I i I I I I I I I I M I I I I I I I I I I I I 
orf 22-1 SGSVLNGAITQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFLKNK 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 22a . pep LFKFTTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 

I II |:| II II III II MUM II I II I I MM MM I I III I I III I IN II Mill I I I 
orf 22-1 LFKFNTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 

370 380 390 400 410 420 

430 440 
or f 22a . pep LCSFVCPGKYEXGPLLRKVLETXEKEGX 

II I M I II II I I I I M II II I I I II I 
orf 22-1 LC S FVCPGKYE YG PLLRKVLET IEKEGX 

430 440 

Further work identified a partial gene sequence <SEQ ID 129> from N [gonorrhoeae, which 
encodes the following amino acid sequence <SEQ ID 130; ORF22ng>: 



1 MIKIKKGLNL PIAGRPEQVI YDGPAITEVA LLGEEYVGMR PSMKIKEGEA 
51 VKKGQVLFED KKNPGWFTA PASGKIAAIH RGEKRVLQSV VIAVEGNDEI 
101 EFERYVPEAL AKLSSEKVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 
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151 VNAMDTNPLA ADPTVIIKEA AEDFKRGLLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVIAIGR 

251 LFVTGRLNTE RWALGGLQV NKPRLLRTVL GAKVSQLTAG ELVDADNRVI 

301 SGSVLNGAIA QGAHDYLGRY HN* 

Further work identified complete gonococcal gene <SEQ ID 131>: 

1 ATGATTAAAA TCAAAAAAGG TCTAAATCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGTCATT TATGACGGCC CGGCCATTAC CGAAGTCGCG TTGCTTGGCG 

101 AAGAATATGT CGGCATGCGC CCCTCGATGA AAATCAAGGA AGGTGAAGCC 

151 GTCAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGAATC CGGGCGTAGT 

201 ATTTACTGCG CCGGCTTCAG GCAAAATCGC CGCTATTCAC CGTGGCGAAA 

251 AGCGCGTACT TCAGTCAGTC GTGATTGCCG TTGAAGGCAA CGACGAAATC 

301 GAGTTCGAAC GCTACGTACC TGAAGCGCTG GCAAAATTGA GCAGCGAAAA 

351 AGTGCGCCGC AACCTGATTC AATCAGGCTT ATGGACTGCG CTTCGCACCC 

401 GTCCGTTCAG CAAAATCCCT GCCGTAGATG CCGAGCCGTT CGCCATCTTC 

451 GTCAATGCGA TGGACACCAA TCCGCTGGCT GCCGACCCTA CGGTCATCAT 

501 CAAAGAAGCC GCCGAAGACT TCAAACGCGG CCTGTTGGTA TTGAGCCGCC 

551 TGACCGAACG TAAAATCCAT GTGTGTAAAG CAGCAGGCGC AGACGTGCCG 

601 TCTGAAAATG CTGCCAATAT CGAAACACAT GAATTTGGCG GCCCGCATCC 

651 TGCCGGCTTG AGTGGCACGC ACATTCATTT CATCGAGCCA GTCGGCGCGA 

701 ATAAAACCGT GTGGACCATC AATTATCAAG ACGTGATTGC TATCGGACGT 

751 TTGTTCGTAA CAGGCCGTCT GAATACCGAG CGCGTGGTTG CCTTGGGCGG 

801 CCTGCAAGTC AACAAACCGC GCCTCTTGCG TACCGTTTTG GGTGCGAAGG 

851 TGTCTCAACT TACCGCCGGC GAATTGGTTG ACGCGGACAA CCGCGTGATT 

901 TCCGGTTCGG TATTGAACGG TGCGATTGCA CAAGGCGCGC ATGATTATTT 

951 GGGACGCTAC CACAATCAGA TTTCCGTTAT CGAAGAAGGC CGCAGCAAAG 

1001 AGCTGTTCGG CTGGGTTGCG CCGCAGCCGG ACAAATACTC CATCACGCGC 

1051 ACCACTCTCG GCCATTTCCT AAAAAACAAA CTCTTCAAGT TCACGACAGC 

1101 CGTCAACGGC GGCGACCGCG CCATGGTACC GATCGGCACT TATGAGCGCG 

1151 TAATGCCGTT GGACATCCTG CCTACCTTGC TTTTGCGCGA TTTAATCGTC 

1201 GGCGATACCG ACAGCGCGCA GGCTTTGGGT TGCTTGGAAT TGGACGAAGA 

1251 AGACCTCGCT TTGTGCAGCT TCGTCTGCCC GGGCAAATAC GAATACGGCC 

1301 CGCTGTTGCG CAAAGTGCTG GAAACCATTG AGAAGGAAGG CTGA 

This encodes a protein having amino acid sequence <SEQ ID 132; ORF22ng-l>: 

1 MIKIKKGLNL PIAGRPEQVI YDGPAITEVA LLGEEYVGMR PSMKIKEGEA 

51 VKKGQVLFED KKNPGWFTA PASGKIAAIH RGEKRVLQSV VIAVEGNDEI 

101 EFERYVPEAL AKLSSEKVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 

151 VNAMDTNPLA ADPTVIIKEA AEDFKRGLLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVIAIGR 

251 LFVTGRLNTE RWALGGLQV NKPRLLRTVL GAKVSQLTAG ELVDADNRVI 

301 SGSVLNGAIA QGAHDYLGRY HNQISVIEEG RSKELFGWVA PQPDKYSITR 

351 TTLGHFLKNK LFKFTTAVNG GDRAMVPIGT YERVMPLDIL PTLLLRDLIV 

401 GDTDSAQALG CLELDEEDLA LCSFVCPGKY EYGPLLRKVL ETIEKEG* 



The originally-identified partial strain B sequence (ORF22) shows 93.7% identity over a 158aa 
overlap with ORF22ng: 

orf 22 . pep MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 60 

I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I II : I I I I I I I : I I I : I I I I I I I I I I I 
orf22ng MIKIKKGLNLPIAGRPEQVIYDGPAITEVALLGEEYVGMRPSMKIKEGEAVKKGQVLFED 60 

orf 22 . pep KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEXNDEIEFERYAPEALANLSGEEVRR 120 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! 1 I I I I I I I I I I I I : I I I I I : I I : I : I I I 
orf22ng KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYVPEALAKLSSEKVRR 120 

orf 22. pep NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNP 158 

I I II I I I I I I I I I I II i I I I I I I M I I M I I I I I I M I 
orf22ng NLIQSGLWTALRTRPFSKIPAVDAEPFAI FVNAMDTNPLAADPTVIIKEAAEDFKRGLLV 180 



The complete sequences from strain B (ORF22-1) and gonococcus (ORF22ng) show 96.2% 
identity in 447 aa overlap: 

10 20 30 40 50 60 

orf 22-1. pep MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 
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10 



15 



20 



25 



30 



35 



40 



45 



I I I I I I j Ml II M II M::M! M M I I I II I I I I I if I: M hi IIMIM II I 
orf22ng-l MIKIKKGLNLPIAGRPEQVIYDGPAITEVALLGEEYVGMRPSMKIKEGEAVKKGQVLFED 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 22-1 . pep KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGEEVRR 
I | | j I | I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I : I I I I I : I I : I : I I I 
orf22ng-l KKNPGVVFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYVPEALAKLSSEKVRR 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 22-1. pep NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNPLAADPTVIIKEAAEDFKRGLLV 
| | | | | | | | | | I I I I I I I I I I I I I I II I I I I I I II II I I II I I I I I I I I I I I I I I I I I I I I 
orf22ng-l NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNPLAADPTVIIKEAAEDFKRGLLV 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 22-1 pep LSRLTERKIHVCKAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTVWTI 

I || || I I I I I I I I I I I I I I I I I M I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I 
orf22ng-l LSRLTERKIHVCKAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTVWTI 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 22-1. pep NYQDVITIGRLFATGRLNTERVIALGGSQVNKPRLLRTVLGAKVSQITAGELVDTDNRVI 
I | | || | : I || I I : I II I I I I I I : I I I I I I I I I II I I I I I I I I I I I : I II I I I I : I I I I I 
orf22ng-l NYQDVIAIGRLFVTGRLNTERWALGGLQVNKPRLLRTVLGAKVSQLTAGELVDADNRVI 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 22-1 . pep SGSVLNGAITQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFLKNK 
I I I Ml IM:l M II II I II! II III I I I MM II I I MM I I I II M Ml M M Ml I I 
orf22ng-l SGSVLNGAIAQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFLKNK 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 22-1. pep LFKFNTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 
I I I I : I I I I I I I I I I I II I I I I I I I i II I I I I I I I I I I II I II I I I I I I I I I M 1 I I I I I 
orf22ng-l LFKFTTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 

370 380 390 400 410 420 

430 440 
orf 22-1 . pep LCSFVCPGKYEYGPLLRKVLETIEKEGX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf22ng-l LCSFVCPGKYEYGPLLRKVLETIEKEGX 

430 440 



Computer analysis of these sequences gave the following results: 



Homology with 48kDa outer membrane protein of Actinobacillus pleuropneumoniae (accession number U24492). 
ORF22 and this 48kDa protein show 72% aa identity in 158aa overlap: 

Orf 22 1 MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 60 

MI IKKGL+LPIAG P Q +++G + EVA+LGEEY GMRPSMKV+EGD VKKGQVLFED 
48kDa 1 MITIKKGLDLPIAGTPAQVIHNGNTVNEVAMLGEEYVGMRPSMKVREGDWKKGQVLFED 60 

orf22 61 KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEXNDEIEFERYAPEALANLSGEEVRR 120 

KKNPGWFTAPASG + I+RGEKRVLQSWI VE +++I F RY LA+LS E+V++ 
48kDa 61 KKNPGWFTAPASGTWTINRGEKRVLQSWIKVEGDEQITFTRYEAAQLASLSAEQVKQ 120 

orf22 121 NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNP 158 

NLI+SGLWTA RTRPFSK+PA+DA P +IFVNAMDTNP 
48kDa 121 NLIESGLWTAFRTRPFSKVPALDAIPSSIFVNAMDTNP 158 

ORF22a also shows homology to the 48kDa Actinobacillus pleuropneumoniae protein: 

gi 1 1185395 (U24492) 48 kDa outer membrane protein [Actinobacillus pleuropneumoniae] 
Length = 449 



50 



55 



60 



65 



Score = 530 bits (1351), Expect = e-150 
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10 



25 



30 



45 



50 



55 



65 



70 



Identities = 274/450 (60%), Positives = 323/450 (70%), Gaps = 4/450 (0%) 

MIKIKKGLNLPIAGRPEQVIYDGPVITEVALLGEEYAGMRPXMKVKEGDAVKKGQVLFED 60 
MI IKKGL+LPIAG P QVI++G + EVA+LGEEY GMRP MKV+EGD VKKGQVLFED 
MITIKKGLDLPIAGTPAQVIHNGNTVNEVAMLGEEYVGMRPSMKVREGDWKKGQVLFED 60 

KKXPGVVFTAPVSGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGXEXXX 12C 
KK PGWFTAP SG + I +RGEKRVLQS WI VEG+++I F RY LA+LS + 



NLI+SGLWTA R RPFSK+PA+DA P +IFVNAMDTNPLAADP W+KE DF+ 



15 Query: 181 LSRL — TERKIHVCKAAGADVP-SENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTV 237 

L+RL ++ +++CK A +++P S I F G HPAGL GTHIHF++PVGA K V 



Query : 


1 


Sbjct : 


1 


Query: 


61 


Sbjct: 


61 


Query : 


121 


Sbjct: 


121 


Query : 


181 


Sbjct : 


181 


Query : 


238 


Sbjct: 


241 


Query: 


298 


Sbjct : 


301 


Query: 


358 


Sbjct: 


361 


Query: 


418 



20 W +NYQDVIAIG+LF TG L T+R+I+L G QV PRL+RT LGA +SQ+TA EL +N 



RVISGSVL+GA G DYLGRY Q+SV+ EGR KELFGW+ P DK+SITRT LGHF 



K KLF FTTAV+GG+RAMVP I G YERVM GDTDSAQ 
K-KLFNFTTAVHGGERAMVPIGAYERVMPLDI I PTLLLRDLAAGDTDSAQNLGCLELDEE 419 

XXXXXS FVCPGKYEXGPLLRKVLETXEKEG 447 
++VCPGK GP+LR LE EKEG 

ORF22ng-l also shows homology with the OMP from A.pleuropneumoniae: 

gi 1 1185395 (U24492) 48 kDa outer membrane protein [Actinobacillus 
35 pleuropneumoniae] Length = 44 9 

Score - 555 bits (1414), Expect - e-157 

Identities = 284/450 (63%), Positives = 337/450 (74%), Gaps - 4/450 (0%) 

Juery: 27 MIKIKKGLNLPIAGRPEQVIYDGPAITEVALLGEEYVGMRPSMKIKEGEAVKKGQVLFED 86 
40 MI IKKGL+LPIAG P QVI++G + EVA+LGEEYVGMRPSMK++EG+ VKKGQVLFED 

MITIKKGLDLPIAGTPAQVIHNGNTVNEVAMLGEEYVGMRPSMKVREGDWKKGQVLFED 60 



Query: 


27 


Sbjct: 


1 


Query: 


87 


Sbjct: 


61 


Query: 


147 


Sbjct: 


121 


Query: 


207 


Sbjct: 


181 


Query: 


264 


Sbjct: 


241 


Query: 


324 


Sbjct: 


301 


Query : 


384 


Sbjct: 


361 


Query: 


444 


Sbjct: 


420 



KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYVPEALAKLSSEKVRR 14 6 
KKN PG W FT AP AS G + I +RGEKRVLQS WI VEG+++I F RY LA LS+E+V++ 



NLI+SGLWTA RTRPFSK+PA+DA P + 1 FVNAMDTN PLAADP V++KE DFK GL V 



L+RL ++ +++CK A +++P S I F G HPAGL GTHIHF++PVGA K V 



W +NYQDVIAIG+LF TG L T+R+++L G QV PRL+RT LGA +SQLTA EL +N 



60 RVISGSVL+GA A G DYLGRY Q+JSV+ EGR KELFGW+ P DK+SITRT LGHF 



K KLF FTTAV+GG+RAMVPIG YERVM GDTDSAQ 



++VCPGK YGP+LR LE I EKEG 



WO 99/24578 



-128- 



PCT/IB98/01665 



Based on this analysis, including the homology with the outer membrane protein of Actinobacillus 
pleuropneumoniae, it was predicted that these proteins from Kmeningitidis and K. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF22-1 (35.4kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
5 A shows the results of affinity purification of the GST-fusion protein, and Figure 5B shows the 
results of expression of the His-fusion in E.coli. Purified GST-fusion protein was used to immunise 
mice, whose sera were used for ELISA (positive result) and FACS analysis (Figure 5C). These 
experiments confirm that ORF22-1 is a surface-exposed protein, and that it is a useful immunogen. 

Example 16 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 133>: 

1 . . GCGnCGnAAA TCATCCATCC CC. .nACGTC GTAGGCCCTG AAGCCAACTG 

51 GTTTTTTATG GTAGCCAGTA CGTTTGTGAT TGCTTTGATT GGTTATTTTG 

101 TTACTGAAAA AATCGTCGAA CCGCAATTGG GCCCTTATCA ATCAGATTTG 

151 TCACAAGAAG AAAAAGACAT TCGGCATTCC AATGAAATCA CGCCTTTGGA 

201 ATATAAAGGA TTAATTTGGG CTGGCGTGGT GTTTGTTGCC TTATCCGCCC 

251 TATTGGCTTG GAGCATCGTC CCTGCCGACG GTATTTTGCG TCATCCTGAA 

301 ACAGGATTGG TTTCCGGTTC GCCGTTTTTA AAATCGATTG TTGTTTTTAT 

,351 TTTCTTGTTG TTTGCACTGC CGGGCATTGT TTATGGCCGG GTAACCCGAA 

401 GTTTGCGCGG CGAACAGGAA GTCGTTAATG CGmyGGCCGA ATCGATGAGT 

4 51 ACTCTGGsGC TTTmTTTGsw CAkcATCTTT TTTGCCGCAC AGTTTGTCGC 

501 ATTTTTTAAT TGGACGAATA TTGGGCAATA TATTGCCGTT AAAGGGGCGA 

551 CGTTCTTAAA AGAAGTCGGC TTGGGCGGCA GCGTGTTGTT TATCGGTTTT 

601 ATTTTAATTT GTGCTTTTAT CAATCTGATG ATAGGCTCCG CCTCCGCGCA 

651 ATGGGCGGTA ACTGCGCCGA TTTTCGTCCC TATGCTGATG TTGGCCGGCT 

701 ACGCGCCCGA AGTCATTCAA GCCGCTTACC GCATCGGTGA TTCCGTTACC 

751 AATATTATTA CGCCGATGAT GAGTTATTTC GGG CTGATTA TGGCGACGGT 

801 GrkCmirauTAC AAAAAAGATG CGGGCGTGGG TaCGcTGATT wCTATGATGT 

851 TGCCGTATTC CGCTTTCTTC TTGATTGCgT GGATTGCCTT ATTCTGCATT 

901 TGGGTATTTg TTTTGGGCCT GCCCGTCGGT CCCGGCGCGC CCACATTCTA 

951 TCCCGCACCT TAA 

This corresponds to the amino acid sequence <SEQ ID 134; ORF12>: 

1 ..AXXIIHPXXV VGPEANWFFM VASTFVIALI GYFVTEKIVE PQLGPYQSDL 

51 SQEEKDIRHS NEITPLEYKG LIWAGWFVA LSALLAWSIV PADGILRHPE 

101 TGLVSGSPFL KSIWFIFLL FALPGIVYGR VTRSLRGEQE WNAXAESMS 

151 TLXLXLXXIF FAAQFVAFFN WTNIGQYIAV KGATFLKEVG LGGSVLFIGF 

201 ILICAFINLM IGSASAQWAV TAPIFVPMLM LAGYAPEVIQ AAYRIGDSVT 

251 NIITPMMSYF GLIMATVXXY KKDAGVGTLI XMMLPYSAFF LIAWIALFCI 

301 WVFVLGLPVG PGAPTFYPAP * 

Further sequence analysis revealed the complete DNA sequence <SEQ ID 135> to be: 

1 ATGAGTCAAA CCGATACGCA ACGGGACGGA CGATTTTTAC GCACAGTCGA 

51 ATGGCTGGGC AATATGTTGC CGCATCCGGT TACGCTTTTT ATTATTTTCA 

101 TTGTGTTATT GCTGATTGCC TCTGCCGTCG GTGCGTATTT CGGACTATCC 

151 GTCCCCGATC CGCGCCCTGT TGGTGCGAAA GGACGTGCCG ATGACGGTTT 

201 GATTTACATT GTCAGCCTGC TCAATGCCGA CGGTTTTATC AAAATCCTGA 

251 CGCATACCGT TAAAAATTTC ACCGGTTTCG CGCCGTTGGG AACGGTGTTG 

301 GTTTCTTTAT TGGGCGTGGG GATTGCGGAA AAATCGGGCT TGATTTCCGC 

351 ATTAATGCGC TTATTGCTCA CAAAATCGCC ACGCAAACTC ACTACTTTTA 

401 TGGTTGTTTT TACAGGGATT TTATCTAATA CCGCTTCTGA ATTGGGCTAT 

451 GTCGTCCTAA TCCCTTTGTC CGCCATCATC TTTCATTCCC TCGGCCGCCA 

501 TCCGCTTGCC GGTCTGGCTG CGGCTTTCGC CGGCGTTTCG GGCGGTTATT 



WO 99/24578 



-129- 



PCT7IB98/01665 



551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 



CGGCCAATCT 
CAACAGGCGG 
CAACTGGTTT 
ATTTTGTTAC 
GATTTGTCAC 
TTTGGAATAT 
CCGCCCTATT 
CCTGAAACAG 
TTTTATTTTC 
CCCGAAGTTT 
ATGAGTACTC 
TGTCGCATTT 
GGGCGACGTT 
GGTTTTATTT 
CGCGCAATGG 
CCGGCTACGC 
GTTACCAATA 
GACGGTGATC 
TGATGTTGCC 
TGCATTTGGG 
ATTCTATCCC 



GTTCTTAGGC 
CGCAAATCAT 
TTTATGGTAG 
TGAAAAAATC 
AAGAAGAAAA 
AAAGGATTAA 
GGCTTGGAGC 
GATTGGTTTC 
TTGTTGTTTG 
GCGCGGCGAA 
TGGGGCTTTA 
TTTAATTGGA 
CTTAAAAGAA 
TAATTTGTGC 
GCGGTAACTG 
GCCCGAAGTC 
TTATTACGCC 
AAATACAAAA 
GTATTCCGCT 
TATTTGTTTT 
GCACCTTAA 



ACAATCGATC 
CCATCCCGAC 
CCAGTACGTT 
GTCGAACCGC 
AGACATTCGG 
TTTGGGCTGG 
ATCGTCCCTG 
CGGTTCGCCG 
CACTGCCGGG 
CAGGAAGTCG 
TTTGGTCATC 
CGAATATTGG 
GTCGGCTTGG 
TTTTATCAAT 
CGCCGATTTT 
ATTCAAGCCG 
GATGATGAGT 
AAGATGCGGG 
TTCTTCTTGA 
GGGCCTGCCC 



CGCTCTTGGC 
TACGTCGTAG 
TGTGATTGCT 
AATTGGGCCC 
CATTCCAATG 
CGTGGTGTTT 
CCGACGGTAT 
TTTTTAAAAT 
CATTGTTTAT 
TTAATGCGAT 
ATCTTTTTTG 
GCAATATATT 
GCGGCAGCGT 
CTGATGATAG 
CGTCCCTATG 
CTTACCGCAT 
TATTTCGGGC 
CGTGGGTACG 
TTGCGTGGAT 
GTCGGTCCCG 



AGGCATCACC 
GCCCTGAAGC 
TTGATTGGTT 
TTATCAATCA 
AAATCACGCC 
GTTGCCTTAT 
TTTGCGTCAT 
CGATTGTTGT 
GGCCGGGTAA 
GGCCGAATCG 
CCGCACAGTT 
GCCGTTAAAG 
GTTGTTTATC 
GCTCCGCCTC 
CTGATGTTGG 
CGGTGATTCC 
TGATTATGGC 
CTGATTTCTA 
TGCCTTATTC 
GCGCGCCCAC 



This corresponds to the amino acid sequence <SEQ ID 136; ORF12-l>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



MSQTDTQRDG RFLRTVEWLG 
VPDPRPVGAK GRADDGLIYI 



VSLLGVGIAE KSGLISALMR 
WLIPLSAII FHSL GRHPLA 
QQAAQIIHPD YVVGPEANWF_ 
DLSQEEKDIR HSNEITPLEY 
PETGLVSGSP FLKSIVVFIF 



MST LGLYLVI I FFAAQFVAF 
GFILICAFIN LMI GSASAQW 
VTN IITPMMS YFGLIMATVI 
CIWVFVLGLP VGPGAPTFYP 



NMLPHP VTLF IIFIVLLLIA SAV GAYFGLS 
VSLLNADGFI KIL THTVKNF TG FAPLGTVL 

LSNTASELGY 
TIDPLLAGIT 
VEPQLGPYQS 
IVPADGILRH 
QEWNAMAES 
VGLGGS VLFI 
IQAAYRIGDS 
FFLIAWIALF 



LLLTKSPRKL TTFMWFTGI 
GLAAAFAGVS GGYSANLFLG 
FMVASTFVIA LIGYFV TEKI 
KGLIW AGWF VALSALLAWS 
LLFALPGIVY GRVTRSLRGE 



FNWTNIGQYI AVKGATFLKE 
AVTAPIFVPM LMLAGYA PEV 
KYKKDAGVGT LISMMLPYSA 
AP* 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF12 shows 96.3% identity over a 320aa overlap with an ORF (ORF12a) from strain A of N. 
meningitidis: 

10 20 30 

orf 12 .pep AXXIIHPXXWGPEANWFFMVASTFVIALI 

I I I I I I I I I i I I I I I I I I I I I I I I I I 
orf 12a AAAFAGVSGGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMVASTFVIALI 
180 190 200 210 220 230 

40 50 60 70 80 90 

orf 12 . pep GYFVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 
I I I | | | I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 12a GYFVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 
240 250 260 270 280 290 

100 110 120 130 140 150 

orf 12. pep PADGILRHPETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAXAESMS 

I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I ! 1 I I I I I I I I I I I I I I I I I I I 
orf 12a PADGILRHPETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAMAESMS 

300 310 320 330 340 350 

160 170 180 190 200 210 

orf 12. pep TLXLXLXXIFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLM 

II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
orf 12a TLGLYLVIIFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLM 

360 370 380 390 400 410 

220 230 240 250 260 270 

orf 12. pep I GS AS AQWAVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTN I IT PMMSYFGLIMATVXXY 
I I I I I I I I I II 1 I I I I i I I I M I I I I I I I I I I i I I I I I M I I I I I I I I I i M I I I I | | 
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orfl2a IGSASAQWAVTAPI FVPMLMLAGYAPEVIQAAYRIGDSVTNI ITPMMSYFGLIMATVIKY 

420 430 440 450 460 470 

280 290 300 310 320 

orf 12 . pep KKDAGVGTLIXMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II ! I I I 
orf 12a KKDAGVGTLISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 
480 490 500 510 520 

The complete length ORF12a nucleotide sequence <SEQ ID 137> is: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 



ATGAGTCAAA 
ATGGCTGGGC 
TTGTGTTATT 
GTCCCCGATC 
GATTCACGTT 
CGCATACCGT 
GTTTCTTTAT 
ATTAATGCGC 
TGGTTGTTTT 
GTCGTCCTAA 
TCCGCTTGCC 
CGGCCAATCT 
CAACAGGCGG 
CAACTGGTTT 
ATTTTGTTAC 
GATTTGTCAC 
TTTGGAATAT 
CCGCCCTATT 
CCTGAAACAG 
TTTTATTTTC 
CCCGAAGTTT 
ATGAGTACTC 
TGTCGCATTT 
GGGCGACGTT 
GGTTTTATTT 
CGCGCAATGG 
CCGGCTACGC 
GTTACCAATA 
GACGGTGATC 
TGATGTTGCC 
TGCATTTGGG 
ATTCTATCCC 



CCGATACGCA 
AATATGTTGC 
GCTGATTGCC 
CGCGCCCTGT 
GTCAGCCTGC 
TAAAAATTTC 
TGGGCGTGGG 
TTATTGCTCA 
TACAGGGATT 
TCCCTTTGTC 
GGTCTGGCTG 
GTTCTTAGGC 
CGCAAATCAT 
TTTATGGTAG 
TGAAAAAATC 
AAGAAGAAAA 
AAAGGATTAA 
GGCTTGGAGC 
GATTGGTTTC 
TTGTTGTTTG 
GCGCGGCGAA 
TGGGGCTTTA 
TTTAATTGGA 
CTTAAAAGAA 
TAATTTGTGC 
GCGGTAACTG 
GCCCGAAGTC 
TTATTACGCC 
AAATACAAAA 
GTATTCCGCT 
TATTTGTTTT 
GCACCTTAA 



ACGGGACGGA 
CGCACCCGGT 
TCTGCCGCCG 
TGGTGCGAAA 
TCGATGCTGA 
ACCGGTTTCG 
GATTGCGGAA 
CAAAATCTCC 
TTATCTAATA 
CGCCATCATC 
CGGCTTTCGC 
ACAATCGATC 
CCATCCCGAC 
CCAGTACGTT 
GTCGAACCGC 
AGACATTCGA 
TTTGGGCTGG 
ATCGTCCCTG 
CGGTTCGCCG 
CACTGCCGGG 
CAGGAAGTCG 
TTTGGTCATC 
CGAATATTGG 
GTCGGCTTGG 
TTTTATCAAT 
CGCCGATTTT 
ATTCAAGCCG 
GATGATGAGT 
AAGATGCGGG 
TTCTTCTTGA 
GGGCCTGCCC 



CGATTTTTAC 
TACGCTTTTT 
GTGCGTATTT 
GGACGTGCCG 
CGGTTTGATC 
CGCCGTTGGG 
AAATCGGGCT 
ACGCAAACTC 
CCGCTTCTGA 
TTTCATTCCC 
CGGCGTTTCG 
CGCTCTTGGC 
TACGTCGTAG 
TGTGATTGCT 
AATTGGGCCC 
CATTCCAATG 
CGTGGTGTTT 
CCGACGGTAT 
TTTTTAAAAT 
CATTGTTTAT 
TTAATGCGAT 
ATCTTTTTTG 
GCAATATATT 
GCGGCAGCGT 
CTGATGATAG 
CGTCCCTATG 
CTTACCGCAT 
TATTTCGGGC 
CGTGGGTACG 
TTGCGTGGAT 
GTCGGTCCCG 



GCACAGTCGA 
ATTATTTTCA 
CGGACTATCC 
ATGACGGTTT 
AAAATCCTGA 
AACGGTGTTG 
TGATTTCCGC 
ACTACTTTTA 
ATTGGGCTAT 
TCGGCCGCCA 
GGCGGTTATT 
AGGCATCACC 
GCCCTGAAGC 
TTGATTGGTT 
TTATCAATCA 
AAATCACGCC 
GTTGCCTTAT 
TTTGCGTCAT 
CAATTGTTGT 
GGCCGGGTAA 
GGCCGAATCG 
CCGCACAGTT 
GCCGTTAAAG 
GTTGTTTATC 
GCTCCGCCTC 
CTGATGTTGG 
CGGTGATTCC 
TGATTATGGC 
CTGATTTCTA 
TGCCTTATTC 
GCGCGCCCAC 



This encodes a protein having amino acid sequence <SEQ ID 138>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



MSQTDTQRDG RFLRTVEWLG 
VPDPRPVGAK GRADDGLIHV 



VSLLGVGIAE KSGLISALMR 
WLIPLSAI1 FHSL GRHPLA 
QQAAQIIHPD YWGPEANWF_ 
DLSQEEKDIR HSNEITPLEY 
PETGLVSGSP FLKS IWFIF 
MST LGLYLVI I FFAAQFVAF 
GFILICAFIN LMI GSASAQW 
VTN IITPMMS YFGLIMATVI 
CIWVFVLGLP VGPGAPTFYP 



NMLPHP VTLF IIFIVLLLIA SAA GAYFGLS 
VSLLDADGLI KIL THTVKNF TG FAPLGTVL 
LLLTKSPRKL TTFMWFTGI 
GLAAAFAGVS GGYSANLFLG 
FMVASTFVIA LIGYFVTEKI 



KGLIW AGWF VAL5ALLAWS 
LLFALPGIVY GRVTRSLRGE 



FNWTNIGQYI AVKGATFLKE 
AVTAPIFVPM LMLAGYA PEV 
KYKKDAGVGT LISMMLPYSA 
AP* 



LSNTASELGY 
TIDPLLAGIT 
VEPQLGPYQS 
IVPADGILRH 
QEWNAMAES 
VGLGGSVLFI 
IQAAYRIGDS 
FFLIAWIALF 



ORF12a and ORF12-1 show 99.0% identity in 522 aa overlap: 

10 20 30 40 50 60 

orf 12a . pep MSQTDTQRDGRFLRTVEWLGNMLPHPVTLFIIFIVLLLIASAAGAYFGLSVPDPRPVGAK 
I I I I I I II I I I I I I I I I I I I II I I I II I I I I I II I I I I I I II : II I I I I I I I I I I I I I II 
orf 12-1 MSQTDTQRDGRFLRTVEWLGNMLPHPVTLFIIFIVLLLIASAVGAYFGLSVPDPRPVGAK 

10 20 30 40 50 60 

70 80 90 100 110 120 

GRADDGLIHWSLLDADGLIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 
I III I II !::M I l:IM:IMI I I I M I I I I II II I I I I II I Ml I I I III IN I II! I 
GRADDGLIYIVSLLNADGFIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 



orf 12a. pep 
orfl2-l 
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70 



80 



90 



100 



110 



120 



10 



15 



20 



25 



30 



35 



40 



130 140 150 160 170 180 

orf 12a . pep LLLTKSPRKLTTFMWFTGILSNTASELGYWLIPLSAIIFHSLGRHPLAGLAAAFAGVS 
I I | I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
orf 12-1 LLLTKSPRKLTTFMWFTGILSNTASELGYWLIPLSAIIFHSLGRHPLAGLAAAFAGVS 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 12a . pep GGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMVASTFVIALIGYFVTEKI 

I | | M I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I M I I I I I I I I ! I I I I I I I I I II 
orf 12-1 GGYSANLFLGTIDPLLAGITQQAAQIIHPDYVVGPE7\NWFFMVASTFVIALIGYFVTEKI 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 12a. pep VEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIVPADGILRH 
I I I I I I I I I I I I I II I I I I III I II I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I 
orf 12-1 VEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIVPADGILRH 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 12a . pep PETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAMAESMSTLGLYLVI 
I I I I I I I I I I II I I I I I I I I I II I I I I I I I I II II II I I I I I I II I I J I I I I II I I I I I I 
orf 12-1 PETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAMAESMSTLGLYLVI 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 12a. pep IFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLMIGSASAQW 
I I I'll I II Ml MM I MM III I II II I I II ill INI I I Ml I II I III II I I Ml II 
orf 12-1 I F FAAQFVAFFN WTN I GQ YI AVKGAT FLKE VG LGG S VL F I G FI L I CAF I N LM I G S AS AQW 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 12a , pep AVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGT 
I I I I [ I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I 1 I I I I I I II I M I I I I I I I I I II 
orf 12-1 AVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGT 

430 440 450 460 470 480 

490 500 510 520 

orf 12a. pep LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 
I I I II I I I I I I II I I II II I I I I I I I I II I II I I I I I II I I I I 
orf 12-1 LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 

490 500 510 520 



45 Homology with a predicted ORF from N. gonorrhoeae 

ORF12 shows 92.5% identity over a 320aa overlap with a predicted ORF (ORF12.ng) from N. 
gonorrhoeae: 



50 



55 



60 



65 



orf 12. pep 
orfl2ng 
orfl2.pep 
orfl2ng 
orf 12 .pep 
orfl2ng 
orf 12 .pep 
orf 12ng 
orf 12 .pep 
orfl2ng 



AXXIIHPXXWGPEANWFFMVASTFVIALI 
I 11 I I I I I I II I I I I I : \ I I I I I I I I 
AAAFAGVSGGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMAASTFVIALI 



PADGILRHPETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAXAESMS 
I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I II I I I I : I II I I I I : I I I I I Mill 
PADGILRHPETGLVAGSPFLKSIWFIFLLFALPGIVYGRITRSLRGEREWNAMAESMS 



30 



232 



90 



GYFVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 

I I II I I I I I I M II II I M I I I I I I I I II I I I II I I M 11 I II 1 II I M I II I II I I II I 
GYFVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 292 



150 



352 



210 



TLXLXLXXIFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLM 
II I I II II III II I MINI III Ml I M:MI: I I I I I I I I I I II I I II I I I I 
TLGLYLVIIFFAAQFVAFFNWTNIGQYIAVKGAVFLKKFRLGGSVLFIGFILICAFINLM 412 

IGSASAQWAVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVXXY 270 
I I II I I M I I M 1 I I I M I I II 1 I I : I 1 I 1 II I It 1 M I I I I I I I I I II I I II I II I 
IGSASAQWAVTAPIFVPMLMLAGNAPQVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKY 472 
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orf 12 .pep KKDAGVGTLIXMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAP 320 

I | | | I I I I I I I I I I I I.I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I : I 
orfl2ng KKDAGVGTLISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGTPTFYPVP 522 

The complete length ORF12ng nucleotide sequence <SEQ ID 139> is: 

1 ATGAGTCAAA CCGACGCGCG TCGTAGCGGA CGATTTTTAC GCACAGTCGA 

51 ATGGCTGGGC AATATGTTGC CGCACCCGGT TACGCTTTTT ATTATTTTCA 

101 TTGTGTTATT GCTGATTGcc tCtgCCGTCG GTGCGTATTT CGGACTATCC 

151 GTCCCCGATC CGCGTCCTGT TGGGGCGAAA GGACGTGCCG ATGACGGTTT 

201 GATTCACGTT GTCAGCCTGC TCGATGCCGA CGGTTTGATC AAAATCCTGA 

251 CGCATACCGT TAAAAATTTC ACCGGTTTCG CGCCGTTGGG AACGGTGTTG 

301 GTTTCTTTAT TGGGCGTGGG GATTGCGGAA AAATCGGGCT TGATTTCCGC 

351 ATTAATGCGC TTATTGCTCA CAAAATCCCC ACGCAAACTC ACTACTTTTA 

401 TGGTTGTTTT TACAGGGATT TTATCCAATA CGGCTTCTGA ATTGGGCTAT 

451 GTCGTCCTAA TCCCTTTGTC CGCCGTCATC TTTCATTCGC TCGGCCGCCA 

501 TCCGCTTGCC GGTTTGGCTG CGGCTTTCGC CGGCGTTTCG GGCGGTTATT 

551 CGGCCAATCT GTTCTTAGGC ACAATCGATC CGCTCTTGGC AGGCATCACC 

601 CAACAGGCGG CGCAAATCAT CCATCCCGAC TACGTCGTAG GCCCTGAAGC 

651 CAACTGGTTT TTTATGGCAG CCAGTACGTT TGTGATTGCT TTGATTGGTT 

701 ATTTTGTTAC TGAAAAAATC GTCGAACCGC AATTGGGCCC TTATCAATCA 

751 GATTTGTCAC AAGAAGAAAA AGACATTCGG CATTCCAATG AAATCACGCC 

801 TTTGGAATAT AAAGGATTAA TTTGGGCAGG CGTGGTGTTT GTTGCCTTAT 

851 CCGCCCTATT GGCTTGGAGC ATCGTCCCTG CCGACGGTAT TTTGCGTCAT 

901 CCTGAAACAG GATTGGTTGC CGGTTCGCCG TTTTTAAAAT CGATTGTTGT 

951 TTTTATTTTC TTGTTGTTTG CGCTGCCGGG CATTGTTTAT GGCCGGATAA 

1001 CCCGAAGTTT GCGCGGCGAA CGGGAAGTCG TTAATGCGAT GGCCGAATCG 

1051 ATGAGTACTT TGGGACTTTA TTTGGTCATC ATCTTTTTTG CCGCACAGTT 

1101 TGTCGCATTT TTTAATTGGA CGAATATTGG GCAATATATT GCCGTTAAAG 

1151 GGGCGGTGTT CTTAAAAGAA GTCGGCTTGG GCGGCAGTGT GTTGTTTATC 

1201 GGTTTTATTT TAATTTGTGC TTTTATCAAT CTGATGATAG GCTCCGCCTC 

1251 CGCGCAATGG GCGGTAACTG CGCCGATTTT CGTCCCTATG CTGATGTTGG 

1301 CCGGCTACGC GCCCGAAGTC ATTCAAGCCG CTTACCGCAT CGGTGATTCC 

1351 GTTACCAATA TTATTACGCC GATGATGAGT TATTTCGGGC TGATTATGGC 

1401 GACGGTAATC AAATACAAAA AAGATGCGGG CGTAGGCACG CTGATTTCTA 

1451 TGATGTTGCC GTATTCCGCT TTCTTCTTAA TTGCATGGAT CGCCTTATTC 

1501 TGCATTTGGG TATTTGTTTT GGGTCTGCCC GTCGGTCCCG GCACACCCAC 

1551 ATTCTATCCG GTGCCTTAA 

This encodes a protein having amino acid sequence <SEQ ID 140>: 



1 MSQTDARRSG RFLRTVEWLG NMLPHPVTLF IIFIVLLLIA SAVGAYFGLS 

51 VPDPRPVGAK GRADDG LIHV VSLLDADGLI KIL THTVKNF TG FAPLGTVL 

101 V5LLGVGIA E KSGLISALMR LLLTKSPRKL TTFMWFTGI LSNTASELGY 

151 WLIPLSAVI FHSL GRHPLA GLAAAFAGVS GGYSANLFLG TIDPLLAGIT 

201 QQAAQIIHPD YVVGPEANWF FMAASTFVIA LIGYFV TEKI VEPQLGPYQS 

251 DLSQEEKDIR HSNEITPLEY KGLIW AGWF VALSALLAWS IV PADGILRH 

301 PETGLVAGSP FLKS IWFIF LLFALPGIVY G RITRSLRGE REWNAMAES 

351 MST LGLYLVI I FFAAQFVAF FNWTNIGQYI AVKGAVFLKK FRLGGS VLFI 

401 GFILICAFIN LMI GSASAQW AVTAPIFVPM LMLAGNAPQV IQAAYRIGDS 

451 VTN IITPMMS YFGLIMATVI KYKKDAGVGT LISMMLPYSA FFLIAWIALF 

501 CIWVFVLGLP VGPGTPTFYP VP* 

ORF12ng shows 97.1% identity in 522 aa overlap with ORF12-1: 



10 20 30 40 50 60 

orf 12-1 . pep MSQTDTQRDGRFLRTVEWLGNMLPHPVTLFI I FIVLLLIASAVGAYFGLSVPDPRPVGAK 
I I I I I :: I : I I I I I I I I I I I I I I I I t I I I I M I I I I I I I I I M I I I I II I I I I I I I I I I 1 
orfl2ng MSQTDARRSGRFLRTVEWLGNMLPHPVTLFI I FIVLLLIASAVGAYFGLSVPDPRPVGAK 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 12-1 . pep GRADDGLIYIVSLLNADGFIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 

M I MM Mhlll:|l I IN I Ml I I II M li I I I II Mil I II I I I I 

orfl2ng GRADDGLIHWSLLDADGLIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 12-1 . pep LLLTKS PRKLTTFMWFTGILSNTASELGYWLI PLSAI I FHSLGRHPLAGLAAAFAGVS 
I M I I I I I 1 M I I I I i i M I I I I I I I I I I I I I I M I 1 I : I I I I I I I I I I I I I I I I I I I I I 
orfl2ng LLLTKS PRKLTTFMWFTGILSNTASELGYWL I PLS AVI FHSLGRHPLAGLAAAFAGVS 



WO 99/24578 



-133- 



PCT/IB98/01665 



130 



140 



150 



160 



170 



180 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



70 



190 200 210 220 230 240 

orf 12-1. pep GGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMVASTFVIALIGYFVTEKI 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I : I I I I I I I I I I I I I I I I I 
orfl2ng GGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMAASTHVIALIGYFVTEKI 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 12-1. pep VEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGVVFVALSALLAWSIVPADGILRH 
I I I I I I I I I I I t I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I 
orfl2ng VEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIVPADGILRH 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 12-1. pep PETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAMAESMSTLGLYLVI 
I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I : I I I I I I I I I I I I I I I I I I I 
orfl2ng PETGLVAGSPFLKSIWFIFLLFALPGIVYGRITRSLRGEREWNAMAESMSTLGLYLVI 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 12-1. pep IFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLMIGSASAQW 
I I I I I I I I I I I I I I I I I I I I I I ! I I : I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I 
orfl2ng IFFAAQFVAFFNWTNIGQYIAVKGAVFLKEVGLGGSVLFIGFILICAFINLMIGSASAQW 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 12-1. pep AVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGT 
I I I I I I I I I I I I I I I I I I I I I I I i I I II I I I I I I I I I I I I I I I II I I I I I I I I I I M I I I 
orfl2ng AVTAPIFVPMLMIAGYAPEVIQAAYRIGDSVTNIITP^SYFGLIMATVIKYKKDAGVGT 

430 440 450 460 470 480 

490 500 510 520 

orf 12-1 .pep LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 
I I I I I I I I I I I I I I 1 I I I I I I I I II I I I I I I I II : I I I I I : I I 
orfl2ng LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGTPTFYPVPX 

490 500 510 520 

In addition, ORF12ng shows significant homology with a hypotehtical protein from E.coli: 

sp|P46133|YDAH_ECOLI HYPOTHETICAL 55.1 KD PROTEIN IN OGT-DBPA INTERGENIC REGION 
>gi | 1787597 (AE000231) hypothetical protein in ogt 5 'region [Escherichia coli] 
Length « 510 
Score = 329 bits (835), Expect = 2e-89 

Identities = 178/507 (35%), Positives = 281/507 (55%), Gaps - 15/507 (2%) 

RSGRFLRTVEWLGNMLPHPVTXXXXXXXXXXXASAVGAYFGLSVPDPRPVGAKGRADDGL 67 
+SG+ VE +GN +PHP +A+ + FG+S +P ' D 
QSGKLYGWVERIGNKVPHPFLLFIYLIIVLMVTTAILSAFGVSAKNP TDGTP 64 

IHWSLLDADGLIKILTHTVKNFTGFAPXXXXXXXXXXXXIAEKSGLISALMRLLLTKSP 127 
+ V +LL +GL L + +KNF+GFAP +AE+ GL+ ALM + + 

VWKNLLSVEGLHWFLPNVIKNFSGFAPLGAILALVLGAGLAERVGLLPALMVKMASHVN 124 



Query: 


8 


Sbjct: 


13 


Query: 


68 


Sbjct: 


65 


Query: 


128 


Sbjct: 


125 


Query: 


188 


Sbjct: 


185 


Query: 


248 


Sbjct: 


245 


Query: 


308 


Sbjct: 


299 


Query: 


368 



+ ++MV+F 



S+ +S+ V++ P+ A+IF ++GRHP+AGL AA AGV G++ANL 



+ T D LL+GI+ +AA -HP 



+Q + 



++ + + 



NW+FMA+S V+ ++G +T+KI+EP+LG 



GL AGW + A +A ++P +GILR P V 
-GLRIAGWSLLFIAAIALMVIPQNGILRDPINHTVM 298 



SPF+K IV I L F + + YG TR++R + ++ + M E M + ++ 



NW+N+G++IAV 



L+ GL G F+G L+ +F+ + I S SA W++ APIF 
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10 



Sbjct: 359 VAMFNWSNMGKFIAVGLTDILESSGLSGIPAFVGLALLSSFLCMFIASGSAIWSILAPIF 418 

Query: 428 VPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGTLISMMLP 487 

VPM ML G+ P Q +RI DS + P+ + L + + +YK DA +GT S++LP 
Sbjct: 419 VPMFMLLGFHPAFAQILFRIADSSVLPLAPVSPFVPLFLGFLQRYKPDAKLGTYYSLVLP 478 

Query: 488 YSAFFLIAWIALFCIWVFVLGLPVGPG 514 

Y FL+ W+ + W +++GLP+GPG 
Sbjct: 479 YPLIFLWWLLMLLAW-YLVGLPIGPG 504 

Based on this analysis, including the presence of several putative transmembrane domains and the 
predicted actinin-type actin-binding domain signature (shown in bold) in the gonococcal protein, 
it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 

15 Example 17 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 141>: 

1 . . ACAGCCGGCG CAGCAGGTTn CnCGGTCTTC GTTTTCGTAA CGGACAGTCA 

51 GGTGGAGGTG TTCGGGAACA TCCAGACCGC AGTGGAAACA GGTTTTTTTC 

101 ATGGCATTTC GGTTTCGTCT GTGTTTGGTG CGGCGGCACA AGACTCGGCA 

20 151 ATgGCTTCGC GCAGTGCGTC TATACCGGTA TTTTCAGCAA CGGAAATGCG 

201 GACGGcGgCA ATTTTTCCCG CAGCGTCGCG CCATATGCCC GTGTTTTgTT 

251 CTTCAGACGG CAGCAGGTCG GTTTTGTTGT ACACCTTgAT GCACGGAaTA 

301 TCGCCGGCAT GGATTTCTTG CAGTACGTTT TCCACGTCTT CAATCTGCTG 

351 TCCGCTGTTC GGAGCGGCGG CATCGACGAC GTGCAGCAGC ACATCgGcTT 

25 401 gCGCGGTTTC TTCCAGCGTG GCgGAAAAGG CGGAAATCAG TTTgTGCGGC 

451 agATyGCTnA CGAATCCGAC GGTATCGGTC AGGATAATGC TGCATTCGGG 

501 ACT.. 

This corresponds to the amino acid sequence <SEQ ID 142; ORF14>: 

1 ..TAGAAGXXVF VFVTDSQVEV FGNIQTAVET GFFHGISVSS VFGAAAQDSA 

30 51 MASRSASIPV FSATEMRTAA IFPAASRHMP VFCSSDGSRS VLLYTLMHGI 

101 SPAWISCSTF STSSICCPLF GAAASTTCSS TSACAVSSSV AEKAEISLCG 

151 RXLTNPTVSV RIMLHSG. . 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a predicted ORF from N. meningitidis (strain A) 
35 ORF14 shows 94.0% identity over a 167aa overlap with an ORF (ORF14a) from strain A of N. 
meningitidis: 

10 20 30 

orf 14 . pep TAGAAGXXVFVFVTDSQVEVFGNIQTAVET 

1:1111 I I I I I I I : I : : I I I I : I I I I I 
40 orf 14a GRQLGFLRVGGALFVITAQARVNNALCDCLTTGAAGFAVFVFVTDGQMQVFGNVQPAVET 

150 160 170 180 190 200 

40 50 60 70 80 90 

orf 14 .pep GFFHGISVSSVFGAAAQDSAMASRSASIPVFSATEMRTAAIFPAASRHMPVFCSSDGSRS 

45 * " I I | I | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I t I I I I 

orf 14a GFFHGISVSSVFGAAAQYSAMASRSASIPVFSATEMRTAAIFPAASRHMPVFCSSDGSRS 
210 220 230 240 250 260 

100 110 120 130 140 150 

50 orf 14. pep VLLYTLMHGISPAWISCSTFSTSSICCPLFGAAASTTCSSTSACAVSSSVAEKAEISLCG 

II IN I Mill Ml IMI MM MIMII I I I H MM I I I I Ml III M II !! I Ml I I 
orf 14a VLLYTLMHGISPAWISCSTFSTSSICCPLFGAAASTTCSSTSACAVSSSVAEKAEISLCG 
270 280 290 300 310 320 
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160 

or f 1 4 . pep RXLTNPTVSVRIMLHSG 
I I I I I I I I I I I I I I I I 

orfl4a RSLTNPTVSVRIMLHSGLMYSRRAWSSVAKSWSFAYMPDLVSRLNRLDLPTLVX 
330 340 350 360 370 380 

The complete length ORF14a nucleotide sequence <SEQ ID 143> is: 

1 ATGGAGGATT TGCAGGAAAT CGGGTTCGAT GTCGCCGCCG TAAAGGTAGG 

51 TCGGCAGCGC GAACATCATC GTCTGCATCA TCCCCAGCCC GGCAACGGCG 

101 AGGCGGACGA TGTATTGTTT GCGTTCTTTT TGGTTGGCGG CTTCGATTTT 

151 TTGCGCGTCA TAGGGTGCGG CGGTGTAGCC TATCTGCCTG ATTTTCAACA 

201 GAATGTCGGA AAGGCGGATT TTGCCGTCGT CCCAGACGAC GCGGCAGCGG 

251 TGCGTGCTGT AATTGAGGTC GATGCGGACG ATGCCGTCTG TACGCAAAAG 

301 CTGCTGTTCG ATCAGCCAGA CGCAGGCGGC GCAGGTGATG CCGCCGAGCA 

351 TTAAAACCGC CTCGCGCGTG CCGCCGTGGG TTTCCACAAA GTCGGACTGG 

401 ACTTCGGGCA GGTCGTACAG GCGGATTTGG TCGAGGATTT CTTGGGGCGG 

451 CAGCTCGGTT TTTTGCGCGT CGGCGGTGCG TTGTTTGTAA TAACTGCCCA 

501 AGCCCGCGTC AATAATGCTT TGTGCGACTG CCTGACAACC GGCGCAGCAG 

551 GTTTCGCGGT CTTCGTTTTC GTAACGGACG GTCAGATGCA GGTTTTCGGG 

601 AACGTCCAGC CCGCAGTGGA AACAGGTTTT TTTCATGGCA TTTCGGTTTC 

651 GTCTGTGTTT GGTGCGGCGG CACAATACTC GGCAATGGCT TCGCGCAGTG 

701 CGTCTATACC GGTATTTTCA GCAACGGAAA TGCGGACGGC GGCAATTTTT 

751 CCCGCAGCGT CGCGCCATAT GCCCGTGTTT TGTTCTTCAG ACGGCAGCAG 

801 GTCGGTTTTG TTGTACACCT TGATGCACGG AATATCGCCG GCATGGATTT 

851 CTTGCAGTAC GTTTTCCACG TCTTCAATCT GCTGTCCGCT GTTCGGAGCG 

901 GCGGCATCGA CGACGTGCAG CAGCACATCG GCTTGCGCGG TTTCTTCCAG 

951 CGTGGCGGAA AAGGCGGAAA TCAGTTTGTG CGGCAGATCG CTGACGAATC 

1001 CGACGGTATC GGTCAGGATA ATGCTGCATT CGGGACTGAT GTACAGCCGC 

1051 CGCGCCGTCG TGTCGAGTGT GGCGAAAAGC TGGTCTTTCG CATATATGCC 

1101 CGACTTGGTC AGCCGGTTGA ACAGACTGGA TTTGCCGACA TTGGTATAG 

This encodes a protein having amino acid sequence <SEQ ID 144>: 

1 MEDLQEIGFD VAAVKVGRQR EHHRLHHPQP GNGEADDVLF AFFLVGGFDF 

51 LRVIGCGGVA YLPDFQQNVG KADFAWPDD AAAVRAVIEV DADDAVCTQK 

101 LLFDQPDAGG AGDAAEH*NR LARAAVGFHK VGLDFGQWQ ADLVEDFLGR 

151 QLGFLRVGGA LFVITAQARV NNALCDCLTT GAAGFAVFVF VTDGQMQVFG 

201 NVQPAVETGF FHGISVSSVF GAAAQYSAMA SRSASIPVFS ATEMRTAAIF 

251 PAASRHMPVF CSSDGSRSVL LYTLMHGISP AWISCSTFST SSICCPLFGA 

301 AASTTCSSTS ACAVSSSVAE KAEISLCGRS LTNPTVSVRI MLHSGLMYSR 

351 RAWSSVAKS WSFAYMPDLV SRLNRLDLPT LV* 

It should be noted that this sequence includes a stop codon at position 118. 
Homology with a predicted ORF from N.zonorrhoeae 

ORF14 shows 89.8% identity over a 167aa overlap with a predicted ORF (ORF14.ng) from N. 
gonorrhoeae: 

orf 1 4 . pep TAGAAGXXVFVFVTDSQVEVFGNIQTAVET 30 

II III I I : I I : I : 1 :: I I I I : I INI 
orfl4ng GRQFGFFRVGGAS FVITAQAGI DDALCDCLTADAAGFAVFAFVADGQMQVFGNVQPAVET 208 

orf 14 . pep GFFHGISVSSVFGAAAQDSAMASRSASIPVFSATEMRTAAIFPAASRHMPVFCSSDGSRS 90 

I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I 
orfl4ng GFFHGISVS SVFGAAAQYSAMASRSAS I PVFSATEMRTAAI FPAASRHMPVFCSS DGSRS 268 

orf 14 . pep VLLYTLMHGISPAWISCSTFSTSSICCPLFGAAASTTCSSTSACAVSSSVAEKAEISLCG 150 

I I I I I II I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I II : I I I : I I I I I I I I I I I 
orfl4ng VLLYTLMHGISWAWISCSTFSTSSICCPLFRAAASTTCSSTSACTVSSKVAEKAEISLCG 328 

orf 14. pep RXLTNPTVSVRIMLHSG 167 
I I I 1 I I I I I I I I M : I 

orfl4ng RSLTNPTVSVRIMLHAGLMYSRRAWSRVAKSWSFAYMPDLVSRLNRLDLPTLV 382 

The complete length ORF14ng nucleotide sequence <SEQ ID 145> is predicted to encode a protein 



having amino acid sequence <SEQ ID 146>: 
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1 MEDLQEIGFD VAAVKVGRQR EHHRLHHTQS GNGKADD VLF AFFLVGGFDF 

51 LRVI GCGGVA CLPDFQQNVG EADFAWPDD AAAVRAVIEV DADDAVCAQK 

101 LLFDQPDAGG AGNAAEHQHC FVRAIMGFHK VGLDFGQWQ ADLVEDFLGR 

151 QFGFFRVGGA SFVITAQAGI DDALCDCLTA DAAGFAVFAF VADGQMQVFG 

201 NVQPAVETGF FHGISVSSVF GAAAQYSAMA SRSASIPVFS ATEMRTAAIF 

251 PAASRHMPVF CSSDGSRSVL LYTLMHGISW AWISCSTFST SSICCPLFRA 

301 AASTTCSSTS ACTVSSKVAE KAEISLCGRS LTNPTVSVRI MLHAGLMYSR 

351 RAWSRVAKS WSFAYMPDLV SRLNRLDLPT LV* 

Based on the putative transmembrane domain in the gonococcal protein, it is predicted that the 
proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 



Example 18 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 147>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



. GGCCATTACT 
GCCGTATCTG 
TGCCGAACTC 
TCGTTCGGCG 
GATGCAGCCG 
AAA . NTACGC 
GTGGCGGCGA 
CGCCGANAAA 
GTGCGGCGTT 
GAATACGANC 
GAATCAGGAA 



CCGACCGCAC 
CTTTATGGCA 
GGGCAGCTTC 
CGCTGATGAT 
TTTAAGATGA 
CTACGGGATT 
TTCTGCCGTT 
GGCGTTGTGC 
GCTGGTGATT 
CGGAAACCTA 
AAAGCCAACT 



TTGGAAGCCG 
CGCTGATTGC 
GGTTTCGGCT 
TGCGCTGTTA 
TGGTCGGCGA 
CAAAGTTTCT 
TGTGTTTGCG 
CGCAGACCGT 
ACCAGCGCGT 
CGCCCGTTAC 
GGATCGCACT 



CGTTTGGNCG 
GGTTATTGTG 
ATGCGTCGCT 
GACGTGTCGT 
CATGGTCAAC 
TAGCAAATAC 
TATATCGGTT 
GGTCGTGGCG 
TCACGATTTT 
CACGGCATCG 
CTTAAAA.CC 



GCCGCCGTCT 
ATGATTTTGA 
GGCGGCTTTG 
CAAATATGGC 
GAGGAGCAGA 
GGGCGCGGTC 
TGGCGAACAC 
TTTTATGTGG 
CAAAGTGAAG 
ATGTCGCCGC 
GCGC. . 



This corresponds to the amino acid sequence <SEQ ID 148; ORF16>: 



1 ..GHYSDRTWKP RLXGRRLPYL LYGTLIAVIV MILMPNSGSF GFGYASLAAL 

51 SFGALMIALL DVSSNMAMQP FKMMVGDMVN EEQKXYAYGI QSFLANTGAV 

101 VAAILPFVFA YIGLANTAXK GWPQTWVA FYVGAALLVI TSAFTIFKVK 

151 EYXPETYARY HGIDVAANQE KANWIALLKX A.. 

Further work revealed the complete nucleotide sequence <SEQ ID 149>: 



1 ATGTCGGAAT ATACGCCTCA 

51 AAAAAGCACG ATTTGGATGC 

101 CCTTTACCCT GCAAAGCTCG 

151 GCAGACCCGC ACAATTTGGG 

201 GATGCTGGTG CAGCCGATTG 

251 CGCGTTTGGG CGGCCGCCGT 

301 GCGGTTATTG TGATGATTTT 

351 CTATGCGTCG CTGGCGGCTT 

401 TAGACGTGTC GTCAAATATG 

451 GACATGGTCA ACGAGGAGCA 

501 CTTAGCAAAT ACGGGCGCGG 

551 CGTATATCGG TTTGGCGAAC 

601 GTGGTCGTGG CGTTTTATGT 

651 GTTCACGATT TTCAAAGTGA 

701 ACCACGGCAT CGATGTCGCC 

751 CTCTTGAAAA CCGCGCCTAA 

801 CTTCTGCTGG TTCGCCTTCC 

851 TTGCGGAAAA CGTCTGGCAC 

901 GAGGCGGGTA ACTGGTACGG 

951 GGTGATTTGT TCGTTTGTAT 

1001 CGGGTTATTT CGGCTGTTTG 

1051 TTCTTCATCG GCAACCAATA 

1101 CATCGCTTGG GCGGGCATTA 

1151 CCTTGTCGGG CAAGCATATG 

1201 ATCTGTATGC CTCAAATCGT 

1251 TATGCTGGGC GGCTTGCAGG 

1301 TGCTGCTGGG CGCGTTTTCC 

1351 GTTTGA 



AACAGCAAAA CAAGGTTTGC CCGCGCTGGC 
TCAGTTTCGG CTTTCTCGGC GTTCAGACGG 
CAAATGAGCC GCATTTTTCA AACGCTAGGC 
CTGGTTTTTC ATCCTGCCGC CGCTGGCGGG 
TCGGCCATTA CTCCGACCGC ACTTGGAAGC 
CTGCCGTATC TGCTTTATGG CACGCTGATT 
GATGCCGAAC TCGGGCAGCT TCGGTTTCGG 
TGTCGTTCGG CGCGCTGATG ATTGCGCTGT 
GCGATGCAGC CGTTTAAGAT GATGGTCGGC 
GAAAGGCTAC GCCTACGGGA TTCAAAGTTT 
TCGTGGCGGC GATTCTGCCG TTTGTGTTTG 
ACCGCCGAGA AAGGCGTTGT GCCGCAGACC 
GGGTGCGGCG TTGCTGGTGA TTACCAGCGC 
AGGAATACGA TCCGGAAACC TACGCCCGTT 
GCGAATCAGG AAAAAGCCAA CTGGATCGAA 
GGCGTTTTGG ACGGTTACTT TGGTGCAATT 
AATATATGTG GACTTACTCG GCAGGCGCGA 
ACCACCGATG CGTCTTCCGT AGGTTATCAG 
CGTTTTGGCG GCGGTGCAGT CGGTTGCGGC 
TGGCGAAAGT GCCGAATAAA TACCATAAGG 
GCTTTGGGCG CGCTCGGCTT TTTCTCCGTT 
CGCGCTGGTG TTGTCTTATA CCTTAATCGG 
TCACTTATCC GCTGACGATT GTGACCAACG 
GGCACTTACT TGGGCTTGTT TAACGGCTCT 
CGCTTCGCTG TTGAGTTTCG TGCTTTTCCC 
CCACTATGTT CTTGGTAGGG GGCGTCGTCC 
GTGTTCCTGA TTAAAGAAAC ACACGGCGGG 
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This corresponds to the amino acid sequence <SEQ ID 150; ORF16-l>: 

1 MSEYTPQTAK QGLPALAKST IWMLSFGFLG VQTAFTLQSS QMSRIFQTLG 

51 ADPHNLGW FF ILPPLAGMLV QPIVG HYSDR TWKPRLGGRR LPYLLYGTLI 

101 AVIVMILMPN SGSFGFGY AS LAALSFGALM IALLDV SSNM AMQPFKMMVG 

151 DMVNEEQKGY AYGIQSFLAN TG AWAAILP FVFAYIGLA N TAEKGWPQT 

201 VWAFYVGAA LLVITSA FTI FKVKEYDPET YARYHGIDVA ANQEKANWIE 

251 LLKTAPKAFW TVTLVQFFGW FAFQYMWTYS AGAIAENVWH TTDASSVGYQ 

301 EAGNWYG VLA AVQSVAAVIC SFVLA KVPNK YHKAGY FGCL ALGALGFFSV 

351 FFIGNQY ALV LSYTLIGIAW AGII TYPLTI VTNALSGKHM GTYLGLFNGS 

401 I CMP QIVASL LSFVLFPMLG GL QATM FLVG GWLLLGAFS VFLI KETHGG 

451 V* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from K meningitidis (strain A) 

ORF16 shows 96.7% identity over a 181aa overlap with an ORF (ORF 16a) from strain A of N. 
meningitidis: 

10 20 30 

orf 16 . pep GHYSDRTWKPRLXGRR LPYLLYGTLIAVIV 

f I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
o r f 1 6 a I FQT LGADPHS LGW FFILPPLAGMLVQPIVG HYS DRTWKPRLGGRR LPYLLYGTLIAVIV 

50 60 70 80 90 100 



40 50 60 70 80 90 

orf 16 . pep MILMPNSGSFGFGY ASLAALSFGALMIALLDV SSNMAMQPFKMMVGDMVNEEQKXYAYGI 
I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I 
orf 1 6a MI LMPN SG S FG FG Y AS LAALS FGALM I ALLD V S SNMAMQP FKMMVGDMVNEEQKGYAYG I 

110 120 130 140 150 160 



100 110 120 130 140 150 

orf 16 . pep QSFLANTG AWAAILPFVFAYIGLA NTAXKGWPQT VWAFYVGAALLVITSA FTIFKVK 

I | I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I 1 I I I I I I I I I I I I I I I I I I I I 
or f 1 6a QS FLANTG AWAAILPFVFAYIGLA NTAEKGWPQT WVAFYVGAALLVITSA FT I FKVK 

170 180 190 200 210 220 

160 170 180 

orf 16 . pep E YX PET YARYHG I DVAANQEKANW I ALLKXA 

II I I I I I I I I I I I I I I I I I I I I I I 111:1 

orf 16a EYNPETYARYHGIDVAANQEKANWIELLKTAPKAFWTVTLVQFFCWFAFQYMWTYSAGAI 
230 240 250 260 270 280 



orf 1 6a AENVWHTTDASSVGYQEAGNWYG VLAAVQSVAAVICSFVLA KVPNKYHKAGYFGCLALGA 
290 300 310 320 330 340 

The complete length ORF16a nucleotide sequence <SEQ ID 151> is: 



1 ATGTCGGAAT ATACGCCTCA 

51 AAAAAGCACG ATTTGGATGC 

101 CCTTTACCCT GCAAAGCTCG 

151 GCCGATCCGC ACAGCCTCGG 

201 GATGCTGGTG CAGCCGATTG 

251 CGCGTTTGGG CGGCCGCCGT 

301 GCGGTTATTG TGATGATTTT 

351 CTATGCGTCG CTGGCGGCTT 

401 TAGACGTGTC GTCAAATATG 

451 GACATGGTCA ACGAGGAGCA 

501 CTTAGCGAAT ACGGGCGCGG 

551 CGTATATCGG TTTGGCGAAC 

601 GTGGTCGTGG CGTTTTATGT 

651 GTTCACGATT TTCAAAGTGA 

701 ACCACGGCAT CGATGTCGCC 

751 CTCTTGAAAA CCGCGCCTAA 

801 CTTCTGCTGG TTCGCCTTCC 

851 TTGCGGAAAA CGTCTGGCAC 

901 GAGGCGGGTA ACTGGTACGG 

951 GGTGATTTGT TCGTTTGTAT 



AACAGCAAAA CAAGGTTTGC CCGCGCTGGC 
TCAGTTTCGG CTTTCTCGGC GTTCAGACGG 
CAGATGAGCC GCATCTTCCA GACGCTCGGT 
CTGGTTCTTT ATCCTGCCGC CGCTGGCGGG 
TCGGCCATTA CTCCGACCGC ACTTGGAAGC 
CTGCCGTATC TGCTTTATGG CACGCTGATT 
GATGCCGAAC TCGGGCAGCT TCGGTTTCGG 
TGTCGTTCGG CGCGCTGATG ATTGCGCTGT 
GCGATGCAGC CGTTTAAGAT GATGGTCGGC 
GAAAGGCTAC GCCTACGGGA TTCAAAGTTT 
TCGTGGCGGC GATTCTGCCG TTTGTGTTTG 
ACCGCCGAGA AAGGCGTTGT GCCGCAGACC 
GGGTGCGGCG TTGCTGGTGA TTACCAGCGC 
AGGAATACAA TCCGGAAACC TACGCCCGTT 
GCGAATCAGG AAAAAGCCAA CTGGATCGAA 
GGCGTTTTGG ACGGTTACTT TGGTGCAATT 
AATATATGTG GACTTACTCG GCAGGCGCGA 
ACCACCGATG CGTCTTCCGT AGGTTATCAG 
CGTTTTGGCG GCGGTGCAGT CGGTTGCGGC 
TGGCGAAAGT GCCGAATAAA TACCATAAGG 
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1001 CGGGTTATTT CGGCTGTTTG GCTTTGGGCG CGCTCGGCTT TTTCTCCGTT 

1051 TTCTTCATCG GCAACCAATA CGCGCTGGTG TTGTCTTATA CCTTAATCGG 

1101 CATCGCTTGG GCGGGCATTA TCACTTATCC GCTGACGATT GTGACCAACG 

1151 CCTTGTCGGG CAAGCATATG GGCACTTACT TGGGCCTGTT TAACGGCTCT 

1201 ATCTGTATGC CGCAAATCGT CGCTTCGCTG TTGAGTTTCG TGCTTTTCCC 

1251 TATGCTGGGC GGCTTGCAGG CCACTATGTT CTTGGTAGGG GGCGTCGTCC 

1301 TGCTGCTGGG CGCGTTTTCC GTGTTCCTGA TTAAAGAAAC ACACGGCGGG 

1351 GTTTGA 

This encodes a protein having amino acid sequence <SEQ ID 152>: 

1 MSEYTPQTAK QGLPALAKST IWMLSFGFLG VQTAFTLQSS QMSRIFQTLG 

51 ADPHSLGW FF ILPPLAGMLV QPIVG HYSDR TWKPRLGGRR LPYLLYGTLI 

101 AVIVMIL MPN SGSFGFGY AS LAALSFGALM IALLDV SSNM AMQPFKMMVG 

151 DMVNEEQKGY AYGIQSFLAN TG AWAAILP FVFAYIGLAN TAEKGWPQT 

201 VWAFYVGAA LLVITSA FTI FKVKEYNPET YARYHGIDVA ANQEKANWIE 

251 LLKTAPKAFW TVTLVQFFCW FAFQYMWTYS AGAIAENVWH TTDASSVGYQ 

301 EAGNWYG VLA AVQSVAAVIC SFVLA KVPNK YHKAGY FGCL ALGALGFFSV 

351 FFIGNQY ALV LSYTLIGIAW AGII TYPLTI VTNALSGKHM GTYLGLFNGS 

401 ICMPQ IVASL LSFVLFPMLG GL QATMF LVG GWLLLGAFS VFLI KETHGG 

451 V* 

ORF16a and ORF16-1 show 99.6% identity in 451 aa overlap: 

10 20 30 40 50 60 

orf 16a. pep MSEYTPQTAKQGLPALAKSTIWMLSFGFLGVQTAFTLQSSQMSRIFQTLGADPHSLGWFF 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I 
orfl6-l MSEYTPQTAKQGLPALAKSTIWMLSFGFLGVQTAFTLQSSQMSRIFQTLGADPHNLGWFF 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 1 6a . pep ILPPLAGMLVQPIVGHYSDRTWKPRLGGRRLPYLLYGTLIAVIVMILMPNSGSFGFGYAS 
I I I I I I j I I [ II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I 
orf 16-1 ILPPLAGMLVQPIVGHYSDRTWKPRLGGRRLPYLLYGTLIAVIVMILMPNSGSFGFGYAS 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 16a . pep LAALS FGALMI ALLDVS SNMAMQP FKMMVGDMVNEEQKG YAYG I QS FLANTGAWAAI L P 

I | | | | I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I II I I I I I I I I I I 
orf 16-1 LAALS FGALMIALLDVS SNMAMQP FKMMVGDMVNEEQKGYAYGIQS FLANTGAWAAI LP 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 16a . pep FVFAYIGLANTAEKGWPQTVWAFYVGAALLVITSAFTIFKVKEYNPETYARYHGIDVA 
I | | | | I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II II I I I I : I I II I I I I I I I I I 
orf 16-1 FVFAYIGLANTAEKGWPQTVWAFYVGAALLVITSAFTIFKVKEYDPETYARYHGIDVA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 1 6a . pep ANQEKANWIELLKTAPKAFWTVTLVQFFCWFAFQYMWTYSAGAIAENVWHTTDASSVGYQ 
I I | | II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I 
or f 1 6- 1 ANQEKANWIELLKTAPKAFWTVTLVQFFCWFAFQYMWTYSAGAIAENVWHTTDASSVGYQ 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 16a . pep EAGNWYGVLAAVQSVAAVICSFVLAKVPNKYHKAGYFGCLALGALGFFSVFFIGNQYALV 

I | | | | I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I II I I I I I I I I I II I I I I I I I I 
orf 16-1 EAGNWYGVLAAVQSVAAVICSFVLAKVPNKYHKAGYFGCLALGALGFFSVFFIGNQYALV 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 16a. pep LSYTLIGIAWAGIITYPLTIVTNALSGKHMGTYLGLFNGSICMPQIVASLLSFVLFPMLG 
I || | | I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I 
orf 16-1 LSYTLIGIAWAGIITYPLTIVTNALSGKHMGTYLGLFNGSICMPQIVASLLSFVLFPMLG 

370 380 390 400 410 420 

430 440 450 

or f 1 6a . pep GLQATMFLVGGWLLLGAFSVFLIKETHGGVX 
I | | | I I I I II I I I I I I I I I I I I I I I I I II I I I 
orf 1 6-1 GLQATMFLVGGWLLLGAFSVFLIKETHGGVX 

430 440 450 
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Homology with a predicted ORF from N. gonorrhoeae 

ORF16 shows 93,9% identity over a 181aa overlap with a predicted ORF (ORF16.ng) from N. 



gonorrhoeae: 

or f 16. pep 
orf 16ng 
orf 16. pep 
orfl6ng 
orf 16. pep 
orf 16ng 
orf 16. pep 
orf 16ng 



GHYSDRTWKPRLXGRRLPYLLYGTLIAVIV 30 
I : I I I I I I I I I I I I I I I I I I I I I I I I 1 I I 

HFSNARRRPAQFGLVFHPAAAGG DAGS ADSGYYSDRTWKPRLGGRRLPYLLYGTLI AVIV 131 

MILMPNSGSFGFGYASLAALSFGALMIALLDVSSNMAMQPFKMMVGDMVNEEQKXYAYGI 90 
I I I M I i I I I I I I I I I I I I i I I I I I I I I I I ! I i I ! I I I I I I I i ! I I I I I I I i I I Mill 

MILMPNSGSFGFGYASLAALSFGALMIALLDVSSNMAMQPFKMMVGDMVNEEQKSYAYGI 191 

QSFLANTGAWAAILPFVFAYIGLANTAXKGWPQTVWAFYVGAALLVITSAFTIFKVK 150 

I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I : I I II I I I III 

QS FLANTDAWAAILPFVFAYIGLANTAEKGWPQTVWAFYVGAALL I ITS AFT I SKVK 251 



EYXPETYARYHGIDVAANQEKANWIALLKXA 
II i I I I I II I II I II M I I I I I I : 111:1 

E YD PET YARYHG I DVAANQEKANW FELLKT APKVFWT VT PVQFFCW FAFRYMWT YSAGAI 



181 



311 



The complete length ORF16ng nucleotide sequence <SEQ ID 153> is: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 



ATGATAGGGG 
TACTTTTCAA 
CAAACAGCAA 
GTTGAGCTTC 
CGCAGATGAG 
GGCTGGTTTT 
AGTGGCTACT 
CCTGCCGTAT 
TGATGCCGAA 
TTGTCGTTCG 
GGCGATGCAG 
AGAAAAGCTA 
GTTGTGGCAG 
CACTGCCGAG 
TGGGTGCGGC 
AAAGAATACG 
CGCGAATCAG 
AAGTGTTTTG 
CGGTATATGT 
CACTACCGAT 
GCGTTTTGGC 



ATCGCCGCGC 
ATCAAAAAAA 
AACAAGGTTT 
GGCTATCTCG 
CCGCATTTTT 
TCATCCTGCC 
ACTCAGACCG 
CTGCTTTACG 
CTCGGGCAGC 
GCGCGCTGAT 
CCGTTTAAGA 
CGCCTACGGG 
CGATTCTGCC 
AAAGGCGTTG 
GTTACTGATT 
ACCCGGAAAC 
GAAAAAGCCA 
GACGGTTACT 
GGACTTACTC 
GCGTCTTCCG 
GGCGGTGTAG 



CGGCAACCAT 
AGGATTTACT 
GCCCGCGCCG 
GCGTTCAGAC 
CAAACGCTAG 
GCCGCTGGCG 
CACTTGGAAG 
GCACGCTGAT 
TTCGGTTTCG 
GATTGCGCTG 
TGATGGTCGG 
ATTCAAAGTT 
GTTTGTGTTC 
TGCCACAAAC 
ATTACCAGTG 
CTACGCCCGT 
ACTGGTTCGA 
CCGGTACAGT 
GGCAGGCGCG 
TAGGCCATCA 



TTCGGATTTT 
TTATGTCGGA 
GCAAAAAGCA 
GGCCTTTACC 
GCGCAGACCC 
GGGATGCTGG 
CCGCGCTTGG 
TGCGGTCATC 
GCTATGCGTC 
TTGGACGTGT 
CGATATGGTC 
TCTTAGCGAA 
GCGTATATCG 
CGTGGTCGTA 
CGTTCACAAT 
TACCACGGCA 
ACTCTTAAAA 
TTTTCTGCTG 
ATTGCAGAAA 
GGAGGCGGGC 



CCAAAGCAAA 
ATATACGCCT 
CGATTTGGAT 
CTGCAAAGCT 
GCACAATTTG 
TTCAGCCGAT 
GCGGCCGCCG 
GTGATGATTT 
GCTGGCGGCC 
CGTCGAATAT 
AACGAGGAGC 
TACGGACGCG 
GTTTGGCGAA 
GCATTCTATG 
CTCCAAAGTC 
TCGATGTCGC 
ACCGCGCCTA 
GTTCGCCTTC 
ACGTCTGGCA 
AACCGGTACG 



This encodes a protein having amino acid sequence <SEQ ID 154>: 



1 MIGDRRAGNH FGFSKANTFQ 

51 VELRLSRRSD GLYPAKLADE 

101 SGYYSDRTWK PRLGGR RLPY 

151 LSFGALMIAL LDV SSNMAMQ 

201 WAAILPFVF AYIGLA NTAE 

251 KEYDPETYAR YHGIDVAANQ 

301 RYMWTYSAGA IAENWHTTD 



IKKKDLLYVG 
PHFSNARRRP 
LLYGTLIAVI 



IYASNSKTRF 
AQFGLVFHPA 
VMILMPNSGS 



PFKMMVGDMV 
KGWPQT WV 
EKANWFELLK 
ASSVGHQEAG 



NEEQKSYAYG 
AFYVGAALLI 



ARAGKKHDLD 
AAGGDAGSAD 
FGFGY ASLAA 
IQSFLANTDA 
ITSAFTISKV 



TAPKVFWTVT 
NRYGVLAAV* 



PVQFFCWFAF 



ORF16ng and ORF 16-1 show 89.3% identity in 261 aa overlap: 



30 40 50 60 70 80 

orf 16-1. pep MLSFGFLGVQTAFTLQSSQMSRIFQTLGADPHNLGWFFILPPIAGMLVQPI-VGHYSDRT 

I : : I I I I I : I : I I I I I 

orfl6ng DVELRLSRRSDGLYPAKLADEPHFSNARRRPAQFGLVF-HPAAAGGDAGSADSGYYSDRT 
50 60 70 80 90 100 

90 100 110 120 130 140 

orf 16-1 . pep WKPRLGGRRLPYLLYGTLIAVIVMILMPNSGSFGFGYASLAALSFGALMIALLDVSSNMA 
I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I i I I I II I I I I I I I I I I I I I I I I I 
orfl6ng WKPRLGGRRLPYLLYGTLIAVIVMILMPNSGSFGFGYASLAALSFGALMIALLDVSSNMA 
110 120 130 140 150 160 
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150 160 170 180 190 200 

orf 16-1. pep MQPFKMMVGDMVNEEQKGYAYGIQSFLANTGAWAAILPHVFAYIGLANTAEKGVVPQTV 
I I I I I I I II I I I I I I I 1 : I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl6ng MQPFKMMVGDMWEEQKSYAYGIQSFLANTDAWAAILPFVFAYIGLANTAEKGVVPQTV 
170 180 190 200 210 220 

210 220 230 240 250 260 

orf 16-1. pep WAFYVGAALLVITS AFT I FKVKEYDPETYARYHGI DVAANQEKANWIELLKTAPKAFWT 

I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I : I II 
orfl6ng WAFYVGAALLIITSAFTISKVKEYDPETYARYHGIDVAANQEKANWFELLKTAPKVFWT 

230 240 250 260 270 280 

270 280 290 300 310 320 

orf 1 6-1 . pep VTLVQFFCWFAFQYMWTYSAGAIAENVWHTTDASSVGYQEAGNWYGVLAAVQSVAAVICS 

II I I I I I I I I I : I I I I I I I II II I I I M I I I I I I I I : I I I I I MINK 
orfl6ng VTPVQFFCWFAFRYMWTYSAGAIAENVWHTTDASSVGHQEAGNRYGVLAAVX 

290 300 310 320 330 340 

Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 19 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 155>: 

1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGCATA CCTTGATGCT 

51 GAACGGCTGT ACGTTGATGT TGTGGGGAAT GAACAACCCG GTCAGCGAAA 

101 CAATCACCCG NAAACACGTT GNCAAAGACC AAATCCGNGN CTTCGGTGTG 

151 GTTGCCGAAG ACAATGCCCA ATTGGAAAAG GGCAGCCTGG TGATGATGGG 

201 CGGAAAATAC TGGTTCGTCG TCAATCCCGA AGATTCGGCG AA.NTGACGG 

251 GNATTTTGAN GGCAGGGCTG GACAAACCCT TCCAAATAGT TNAGGATACC 

301 CCGAGCTATG C.TGCCACCA AGCCCTGCCG GTCAAACTCG GATCGNCTGG 

351 CAGCCAGAAT . . . 

This corresponds to the amino acid sequence <SEQ ED 156; ORF28>: 



1 MLFRKTTAAV LAHTLMLNGC TLMLWGMNNP VSETITRKHV XKDQIRXFGV 
51 VAEDNAQLEK GSLVMMGGKY WFWNPEDSA XXTGILXAGL DKPFQIVXDT 
101 PSYXCHQALP VKLGSXGSQN. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 157>: 



1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGGCAA CCTTGATGCT 

51 GAACGGCTGT ACGTTGATGT TGTGGGGAAT GAACAACCCG GTCAGCGAAA 

101 CAATCACCCG CAAACACGTT GACAAAGACC AAATCCGCGC CTTCGGTGTG 

151 GTTGCCGAAG ACAATGCCCA ATTGGAAAAG GGCAGCCTGG TGATGATGGG 

201 CGGAAAATAC TGGTTCGTCG TCAATCCCGA AGATTCGGCG AAGCTGACGG 

251 GCATTTTGAA GGCAGGGCTG GACAAACCCT TCCAAATAGT TGAGGATACC 

301 CCGAGCTATG CTCGCCACCA AGCCCTGCCG GTCAAACTCG AATCGCCTGG 

351 CAGCCAGAAT TTCAGTACCG AAGGCCTTTG CCTGCGCTAC GATACCGACA 

401 AGCCTGCCGA CATCGCCAAG CTGAAACAGC TCGGGTTTGA AGCGGTCAAA 

451 CTCGACAATC GGACCATTTA CACGCGCTGC GTATCCGCCA AAGGCAAATA 

501 CTACGCCACA CCGCAAAAAC TGAACGCCGA TTACCATTTT GAGCAAAGTG 

551 TGCCTGCCGA TATTTATTAC ACGGTTACTG AAGAACATAC CGACAAATCC 

601 AAGCTGTTTG CAAATATCTT ATATACGCCC CCCTTTTTGA TACTGGATGC 

651 GGCGGGCGCG GTACTGGCCT TGCCTGCGGC GGCTCTGGGT GCGGTCGTGG 

701 ATGCCGCCCG CAAATGA 

This corresponds to the amino acid sequence <SEQ ID 158; ORF28-l>: 



1 MLFRKTTAAV LAATLMLNG C TLMLWGMNNP VSETITRKHV DKDQIRAFGV 

51 VAEDNAQLEK GSLVMMGGKY WFWNPEDSA KLTGILKAGL DKPFQIVEDT 

101 PSYARHQALP VKLESPGSQN FSTEGLCLRY DTDKPADIAK LKQLGFEAVK 

151 LDNRTIYTRC VSAKGKYYAT PQKLNADYHF EQSVPADIYY TVTEEHTDKS 
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201 KLFANILYTP PF LILDAAGA VLALPAAAL G AWDAARK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis f strain A) 

ORF28 shows 79.2% identity over a 120aa overlap with an ORF (ORF28a) from strain A of N. 
meningitidis: 



orf28.pep 
orf28a 

orf28.pep 
orf28a 



10 20 30 40 50 60 

MLFRKTTAAVLAHTLMLNG CTLMLWGMNNPVSETITRKHVXKDQIRXFGWAEDNAQLEK 
I I I I M I I I M I III Mill Mill MMIIIIIilll 

MLFRKTTAAVLAATLMLNGCTVMMWGMNSPFSETTARKHVDKDQIRAFGWAEDNAQLEK 

60 



10 



20 



30 



40 



50 



70 80 90 100 110 120 

GSLVMMGGKYWFVVNPEDSAXXTGILXAGLDKPFQIVXDTPSYXCHQALPVKLGSXGSQN 
I I II I I I I I I I II I I I I I I I MM II 11 I MM : I : M I I II II I Mil 
G S LVMMGGKYW FWN PE D S AKLTG I LKAGL DKQFQMVE PN PR FA- YQAL PVKLE S PAS QN 

70 80 90 100 110 



orf28a FSTEGLCLRYDTDRPADIAKLKQLEFEAVELDNRTIYTRCVSAKGKYYATPQKLNADYHF 
120 130 140 150 160 170 

The complete length ORF28a nucleotide sequence <SEQ ID 159> is: 



1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGGCAA CCTTGATGTT 

51 GAACGGCTGT ACGGTAATGA TGTGGGGTAT GAACAGCCCG TTCAGCGAAA 

101 CGACCGCCCG CAAACACGTT GACAAGGACC AAATCCGCGC CTTCGGTGTG 

151 GTTGCCGAAG ACAATGCCCA ATTGGAAAAG GGCAGCCTGG TGATGATGGG 

201 CGGGAAATAC TGGTTCGTCG TCAATCCTGA AGATTCGGCG AAGCTGACGG 

251 GCATTTTGAA GGCCGGGTTG GACAAGCAGT TTCAAATGGT TGAGCCCAAC 

301 CCGCGCTTTG CCTACCAAGC CCTGCCGGTC AAACTCGAAT CGCCCGCCAG 

351 CCAGAATTTC AGTACCGAAG GCCTTTGCCT GCGCTACGAT ACCGACAGAC 

401 CTGCCGACAT CGCCAAGCTG AAACAGCTTG AGTTTGAAGC GGTCGAACTC 

4 51 GACAATCGGA CCATTTACAC GCGCTGCGTC TCCGCCAAAG GCAAATACTA 

501 CGCCACACCG CAAAAACTGA ACGCCGATTA TCATTTTGAG CAAAGTGTGC 

551 CTGCCGATAT TTATTACACG GTTACGAAAA AACATACCGA CAAATCCAAG 

601 TTGTTTGAAA ATATTGCATA TACGCCCACC ACGTTGATAC TGGATGCGGT 

651 GGGCGCGGTG CTGGCCTTGC CTGTCGCGGC GTTGATTGCA GCCACGAATT 

701 CCTCAGACAA ATGA 

This encodes a protein having amino acid sequence <SEQ ID 160>: 



1 MLFRKTTAAV LAATLMLNG C TVMMWGMNSP FSETTARKHV DKDQIRAFGV 

51 VAEDNAQLEK GSLVMMGGKY WFWNPEDSA KLTGILKAGL DKQFQMVE PN 

101 PRFAYQALPV KLESPASQNF STEGLCLRYD TDRPADIAKL KQLEFEAVEL 

151 DNRTIYTRCV SAKGKYYATP QKLNADYHFE QSVPADIYYT VTKKHTDKSK 

201 LFENIAYTPT TLILDAVGAV LALPVAALIA ATNSSDK* 



ORF28a and ORF28-1 show 86.1% identity in 238 aa overlap: 



10 20 30 40 50 60 

orf 28a , pep MLFRKTTAAVLAATLMLNGCTVMMWGMNSPFSETTARKHVDKDQIRAFGWAEDNAQLEK 
I I I I II II I I II I II I I I I 1 1 M M II I M III M M I I I I M I I II I II I I I II M I 
orf 28-1 MLFRKTTAAVLAATLMLNGCTLMLWGMNNPVSETITRKHVDKDQIRAFGWAEDNAQLEK 

10 20 30 40 50 60 



70 80 90 100 110 119 

orf 28a . pep GSLVMMGGKYWFVVNPEDSAKLTG I LKAGLDKQFQMVEPNPRFA- YQAL PVKLE S PAS QN 

II I 1 II II I I II I I I I I II I II I I II II II II II M I : I M M II I 1 II II I M II 
orf 28-1 GSLVMMGGKYWFWNPEDSAKLTGILKAGLDKPFQIVEDTPSYARHQALPVKLESPGSQN 

70 80 90 100 110 120 

120 130 140 150 160 170 179 

orf 28a . pep FSTEGLCLRYDTDRPADIAKLKQLEFEAVELDNRTIYTRCVSAKGKYYATPQKLNADYHF 
II II II II M I I I M I I II I II I I I M I M M II I I I I I II II 11 I II I I M I I II M I 
orf 28-1 FSTEGLCLRYDTDKPADIAKLKQLGFEAVKLDNRTIYTRCVSAKGKYYATPQKLNADYHF 

130 140 150 160 170 180 
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180 190 200 210 220 230 

orf 28a . pep EQSVPADIYYTVTKKHTDKSKLFENIAYTPTTLILDAVGAVLALPVAALIAATNSSDKX 
I | | M I I I I I I I I : : I I I I I I I I II III I I I II : I I I I I I I : I I I I : : : : : II 
5 orf 28-1 EQSVPADIYYTVTEEHTDKSKLFANILYTPPFLILDAAGAVLALPAAALGAVVDAARKX 

190 200 210 220 230 

Homology with a predicted ORF from N gonorrhoeae 

ORF28 shows 84.2% identity over a 120aa overlap with a predicted ORF (ORF28.ng) from N. 
10 gonorrhoeae: 

orf 2 8 . pep MLFRKTTAAVLAHTLMLNGCTLMLWGMNNPVSETITRKHVXKDQIRXFGWAEDNAQLEK 60 

i I I I I I I II II I II : I I I II : I I 1111111:1111111 I I 1 I I I II I I I I I I I I II 
orf 2 8ng MLFRKTTAAVLAATLILNGCTMMLRGMNNPVSQTITRKHVDKDQIRAFGWAEDNAQLEK 60 

15 orf 28 . pep GSLVMMGGKYWFWNPEDSAXXTGILXAGLDKPFQIVXDTPSYXCHQALPVKLGSXGSQN 120 

I I I I I I I I I I I I : I I I I I II I I : I I I I I I I I I I I I I I I I II I I I I I : : MM 
orf28ng GSLVMMGGKYWFAVNPEDSAKLTGLLKAGLDKPFQIVEDTPSYARHQALPVKFEAPGSQN 120 

The complete length ORF28ng nucleotide sequence <SEQ ID 161> is 

1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGGCAA CCTTGATACT 

20 51 GAACGGCTGT ACGATGATGT TGCGGGGGAT GAACAACCCG GTCAGCCAAA 

101 CAATCACCCG CAAACACGTT GACAAAGACC AAATCCGCGC CTTCGGTGTG 

151 GTTGCCGAAG ACAATGCCCA ATTGGAAAAG GGCAGCCTGG TGATGATGGG 

201 CGGGAAATAC TGGTTCGCCG TCAATCCCGA AGATTCGGCG AAGCTGACGG 

251 GCCTTTTGAA GGCCGGGTTG GACAAGCCCT TCCAAATAGT TGAGGATACC 

25 301 CCGAGCTATG CCCGCCACCA AGCCCTGCCG GTCAAATTCG AAGCGCCCGG 

351 CAGCCAGAAT TTCAGTACCG GAGGTCTTTG CCTGCGCTAT GATACCGGCA 

401 GACCTGACGA CATCGCCAAG CTGAAACAGC TTGAGTTTAA AGCGGTCAAA 

451 CTCGACAATC GGACCATTTA CACGCGCTGC GTATCCGCCA AAGGCAAATA 

501 CTACGCCACG CCGCAAAAAC TGAACGCCGA TTATCATTTT GAGCAAAGTG 

30 551 TGCCCGCCGA TATTTATTAT ACGGTTACTG_AAAAACATAC CGACAAATCC 

601 AAGCTGTTTG GAAATATCTT ATATACGCCcTcCCTTGTTGA TATTGGATGC 

651 GGCGGCCGCG GTGCTGGTCT TGCCTATGGC TCTGATTGCA GCCGCGAATT 

701 CCTCAGACAA ATGA 

This encodes a protein having amino acid sequence <SEQ ID 162>: 

35 1 MLFRKTTAAV LAATLILNG C TMMLRGMNNP VSQTITRKHV DKDQIRAFGV 

51 VAEDNAQLEK GSLVMMGGKY WFAVNPEDSA KLTGLLKAGL DKPFQIVEDT 

101 PSYARHQALP VKFEAPGSQN FSTGGLCLRY DTGRPDDIAK LKQLEFKAVK 

151 LDNRTIYTRC VSAKGKYYAT PQKLNADYHF EQSVPADIYY TVTEKHTDKS 

201 KLFGNILYTP PL LILDAAAA VLVLPMALI A AANSSDK* 

40 ORF28ng and ORF28-1 share 90.0% identity in 23 1 aa overlap: 

10 20 30 40 50 60 

orf 28-1 . pep MLFRKTTAAVLAATLMLNGCTLMLWGMNNPVSETITRKHVDKDQIRAFGWAEDNAQLEK 
I I II I I II I I M I I I : I I I II : I I I I I I I II : M I I I I I I I I 1 I I I I I I I I I I II I M I 
orf28ng MLFRKTTAAVLAATLILNGCTMMLRGMNNPVSQTITRKHVDKDQIRAFGWAEDNAQLEK 
45 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 28-1 . pep G S LVMMGGKY W FWN PE DS AKLTG I LKAG L DK P FQ I VE DT P S Y ARHQALP VKLE S PG S QN 
I I I I I I I II II I : I I 11 II I I I I I : I I I I I I I 1 I I I I I I I I I II I I I II II I : I : I I II I 
50 orf28ng GSLVMMGGKYWFAVNPEDSAKLTGLLKAGLDKPFQIVEDTPSYARHQALPVKFEAPGSQN 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 28-1. pep FSTEGLCLRYDTDKPADIAKLKQLGFEAVKLDNRT I YTRCVSAKGKYYAT PQKLNADYHF 
55 Ml II I I I I I I : I II II I I M I: I I I II II I I I I I I I I I t I I I I I I M M M I I I I 

o r f 2 8 ng FSTGGLCLRYDTGRPDD I AKLKQLE FKAVKLDNRT I YTRCVSAKGKYYAT PQKLNADYHF 

130 140 150 160 170 180 

190 200 210 220 230 239 

60 orf 28-1 . pep EQSVPADIYYTVTEEHTDKSKLFANILYTPPFLILDAAGAVLALPAAALGAWDAARKX 

I I I I i I I I I I M M : I I M I I M : 1 I I II I I : I I I I I I : I I I : M I : : I : 
orf28ng EQSVPADIYYTVTEKHTDKSKLFGNILYTPPLLILDAAAAVLVLPMALIAAANSSDKX 
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190 200 210 220 230 

Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it was predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF28-1 (24kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
6 A shows the results of affinity purification of the GST-fusion protein, and Figure 6B shows the 
results of expression of the His-fiision in E.coli. Purified GST-fusion protein was used to immunise 
mice, whose sera were used for ELISA, which gave a positive result. These experiments confirm 
that ORF28-1 is a surface-exposed protein, and that it may be a useful immunogen. 

Example 20 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 163>: 

1 . . GTCAGTCCTG TACTGCCTAT TACACACGAA CGGACAGGGT TTGAAGGTGT 

51 TATCGGTTAT GAAACCCATT TTTCAGGGCA CGGACATGAA GTACACAGTC 

101 CGTTCGATCA TCATGATTCA AAAAGCACTT CTGATTTCAG CGGCGGTGTA 

151 GACGGCGGTT TTACTGTTTA CCAACTTCAT CGAACATGGT CGGAAATCCA 

201 TCCGGAGGAT GAATATGACG GGCCGCAAGC AGCG.ATTAT CCGCCCCCCG 

251 GAGGAGCAAG GGATATATAC AGCTATTATG TCAAAGGAAC TTCAACAAAA 

301 ACAAAGACTA GTATTGTCCC TCAAGCCCCA TTTTCAGACC GTTGGCTAGA 

351 AGAAAATGCC GGTGCCGCCT CTGGT. . 

This corresponds to the amino acid sequence <SEQ ID 164; ORF29>: 

1 . .VSPVLPITHE RTGFEGVIGY ETHFSGHGHE VHSPFDHHDS KSTSDFSGGV 
51 DGGFTVYQLH RTWSEIHPED EYDGPQAAXY PPPGGARDIY SYYVKGTSTK 
101 TKTSIVPQAP FSDRWLEENA GAASG. . 

Further work revealed the complete nucleotide sequence <SEQ ID 165>: 

1 ATGAATTTGC CTATTCAAAA ATTCATGATG CTGTTTGCAG CAGCAATATC 

51 GTTGCTGCAA ATCCCCATTA GTCATGCGAA CGGTTTGGAT GCCCGTTTGC 

101 GCGATGATAT GCAGGCAAAA CACTACGAAC CGGGTGGTAA ATACCATCTG 

151 TTTGGTAATG CTCGCGGCAG TGTTAAAAAG CGGGTTTACG CCGTCCAGAC 

201 ATTTGATGCA ACTGCGGTCA GTCCTGTACT GCCTATTACA CACGAACGGA 

251 CAGGGTTTGA AGGTGTTATC GGTTATGAAA CCCATTTTTC AGGGCACGGA 

301 CATGAAGTAC ACAGTCCGTT CGATCATCAT GATTCAAAAA GCACTTCTGA 

351 TTTCAGCGGC GGTGTAGACG GCGGTTTTAC TGTTTACCAA CTTCATCGAA 

401 CAGGGTCGGA AATCCATCCG GAGGATGGAT ATGACGGGCC GCAAGGCAGC 

451 GATTATCCGC CCCCCGGAGG AGCAAGGGAT ATATACAGCT ATTATGTCAA 

501 AGGAACTTCA ACAAAAACAA AGACTAATAT TGTCCCTCAA GCCCCATTTT 

551 CAGACCGTTG GCTAAAAGAA AATGCCGGTG CCGCCTCTGG TTTTTTCAGC 

601 CGTGCGGATG AAGCAGGAAA ACTGATATGG GAAAGCGACC CCAATAAAAA 

651 TTGGTGGGCT AACCGTATGG ATGATGTTCG CGGCATCGTC CAAGGTGCGG 

701 TTAATCCTTT TTTAATGGGT TTTCAAGGAG TAGGGATTGG GGCAATTACA 

751 GACAGTGCAG TAAGCCCGGT CACAGATACA GCCGCGCAGC AGACTCTACA 

801 AGGTATTAAT GATTTAGGAA AATTAAGTCC GGAAGCACAA CTTGCTGCCG 

851 CGAGCCTATT ACAGGACAGT GCTTTTGCGG TAAAAGACGG TATCAACTCT 

901 GCCAAACAAT GGGCTGATGC CCATCCAAAT ATAACAGCTA CTGCCCAAAC 

951 TGCCCTTTCC GCAGCAGAGG CCGCAGGTAC GGTTTGGAGA GGTAAAAAAG 

1001 TAGAACTTAA CCCGACTAAA TGGGATTGGG TTAAAAATAC CGGTTATAAA 

1051 AAACCTGCTG CCCGCCATAT GCAGACTTTA GATGGGGAGA TGGCAGGTGG 

1101 GAATAAACCT ATTAAATCTT TACCAAACAG TGCCGCTGAA AAAAGAAAAC 

1151 AAAATTTTGA GAAGTTTAAT AGTAACTGGA GTTCAGCAAG TTTTGATTCA 
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1201 GTGCACAAAA CACTAACTCC CAATGCACCT GGTATTTTAA GTCCTGATAA 

1251 AGTTAAAACT CGATACACTA GTTTAGATGG AAAAATTACA ATTATAAAAG 

1301 ATAACGAAAA CAACTATTTT AGAATCCATG ATAATTCACG AAAACAGTAT 

1351 CTTGATTCAA ATGGTAATGC TGTGAAAACC GGTAATTTAC AAGGTAAGCA 

1401 AGCAAAAGAT TATTTACAAC AACAAACTCA TATCAGGAAC TTAGACAAAT 

1451 GA 

This corresponds to the amino acid sequence <SEQ ID 166; ORF29-l>: 



1 MNLPIQKFMM LFAAAISLLQ IPISHA NGLD ARLRDDMQAK HYEPGGKYHL 

51 FGNARGSVKK RVYAVQTFDA TAVSPVLPIT HERTGFEGVI GYETHFSGHG 

101 HEVHSPFDHH DSKSTSDFSG GVDGGFTVYQ LHRTGSEIHP EDGYDGPQGS 

151 DYPPPGGARD IYSYYVKGTS TKTKTNIVPQ APFSDRWLKE NAGAASGFFS 

201 RADEAGKLIW ESDPNKNWWA NRMDDVRGIV QGAVNPFLMG FQGVGIGAIT 

251 DSAVSPVTDT AAQQTLQGIN DLGKLSPEAQ LAAASLLQDS AFAVKDGINS 

301 AKQWADAHPN ITATAQTALS AAEAAGTVWR GKKVELNPTK WDWVKNTGYK 

351 KPAARHMQTL DGEMAGGNKP IKSLPNSAAE KRKQNFEKFN SNWSSASFDS 

401 VHKTLTPNAP GILSPDKVKT RYTSLDGKIT IIKDNENNYF RIHDNSRKQY 

451 LDSNGNAVKT GNLQGKQAKD YLQQQTHIRN LDK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF29 shows 88.0% identity over a 125aa overlap with an ORF (ORF29a) from strain A of N. 
meningitidis: 



10 20 30 

orf 29 . pep VSPVLPITHERTGFEGVIGYETHFSGHGHE 

|:|:| III (I I! II 11:1 Ml II M I M II 
orf 2 9a E PGGK YHL FGN ARGS VKNRV YAVQT FDAT AVG P I L P I THERTG FE G 1 1 G YE TH FS GHGHE 

50 60 70 80 90 100 



40 50 60 70 80 90 

orf 2 9. pep VHSPFDHHDSKSTSDFSGGVDGGFTVYQLHRTWSEIHPEDEYDGPQAAXYPPPGGARDIY 
I I I I I I : I I I I II I I I I I I I M I I If I I I I I I El Mill Mill:: MIMMIMI 
orf 29a VHSPFDNHDSKSTSDFSGGVDGGFTVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDIY 
110 120 130 140 150 160 



100 110 120 

orf 29 . pep SYYVKGTSTKTKTSIVPQAPFSDRWLEENAGAASG 
M M M M M : : M I : I M II M I : M I M M I 
orf 29a XXYVKGTSTKTKSNIVPRAPFSDRWLKENAGAASGFFSRADEAGKLIWESDPNKNWWANR 
170 180 190 200 210 220 



orf 29a MD D I RG I VQG AVN P FLMG FQG VG I GAI T D S AVS PVT DT AAQQT LQGXNHLGXL S PEAQLA 
230 240 250 260 270 280 

The complete length ORF29a nucleotide sequence <SEQ ID 167> is: 

1 ATGAATTNGC CTATTCAAAA ATTCATGATG CTGTTTGCAG CAGCAATATC 

51 GTNGCTGCAA ATCCCNATTA GTCATGCGAA CGGTTTGGAT GCCCGTTTGC 

101 GCGATGATAT GCAGGCAAAA CACTACGAAC CGGGTGGTAA ATACCATCTG 

151 TTTGGTAATG CTCGCGGCAG TGTTAAAAAT CGGGTTTACG CCGTCCAAAC 

201 ATTTGATGCA ACTGCGGTCG GCCCCATACT GCCTATTACA CACGAACGGA 

251 CAGGATTTGA AGGCATTATC GGTTATGAAA CCCATTTTTC AGGACATGGA 

301 CATGAAGTAC ACAGTCCGTT CGATAATCAT GATTCAAAAA GCACTTCTGA 

351 TTTCAGCGGC GGCGTAGACG GTGGTTTTAC CGTTTACCAA CTTCATCGGA 

401 CAGGGTCGGA AATCCATCCG GAGGATGGAT ATGACGGGCC GCAAGGCAGC 

451 GATTATCCGC CCCCCGGAGG AGCAAGGGAT ATATACANNT ANTATGTCAA 

501 AGGAACTTCA ACAAAAACAA AGAGTAATAT TGTTCCCCGA GCCCCATTTT 

551 CAGACCGCTG GCTAAAAGAA AATGCCGGTG CCGCCTCTGG TTTTTTCAGC 

601 CGTGCTGATG AAGCAGGAAA ACTGATATGG GAAAGCGACC CCAATAAAAA 

651 TTGGTGGGCT AACCGTATGG ATGATATTCG CGGCATCGTC CAAGGTGCGG 

701 TTAATCCTTT TTTAATGGGT TTTCAAGGAG TAGGGATTGG GGCAATTACA 

751 GACAGTGCAG TAAGCCCGGT CACAGATACA GCCGCGCAGC AGACTCTACA 

801 AGGTATNAAT CATTTAGGAA ANTTAAGTCC CGAAGCACAA CTTGCGGCTG 

851 CAACCGCATT ACAAGACAGT GCTTTTGCGG TAAAAGACGG TATCAATTCC 

901 GCCAGACAAT GGGCTGATGC CCATCCGAAT ATAACTGCAA CAGCCCAAAC 
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951 TGCCCTTGCC GTAGCAGANG 

1001 TAGAACTTAA CCCGACCAAA 

1051 ACACCTGCTG TTCGCACCAT 

1101 GAATAGACCG CCTAAATCTA 

1151 CACAACCGTC TTTACAAGCG 

1201 CATGCTTATA ACAAGCATGT 

1251 TATCAATTCA CCAGCAGATT 

1301 ATCCANCAAA TATGAAAGAG 

1351 NATAAAACAG GGACNATAGT 

14 01 TACAGCATTT AGACCAACAT 

This encodes a protein having amino acic 



CCGCAACTAC GGTTTGGGGC GGTAAAAAAG 
TGGGATTGGG TTAAAAATAC NGGCTATAAN 
GCATACTTTG GATGGGG AAA TGGCCGGTGG 
TAACGTCCAA CAGCAAAGCA GATGCTTCCA 
CAACTAATTG GAGAACAAAT TANNNNNGGG 
CATAAGACAA CAAGAATTTA CGGATTTAAA 
TTGCTCGGCA TATTGAAAAT ATTGTTAGCC 
TTACCTCGCG GTAGAACTGC GTATTGGGAT 
TATCCGAGAT AAAAATTCTG ACGATGGAGG 
CAGGTAAAAA ATATTATGAT GATTTATAG 

sequence <SEQ ID 168>: 



1 MNXPIQKFMM LFAAAISXLQ IPISHAN GLD ARLRDDMQAK HYEPGGKYHL 

51 FGNARGSVKN RVYAVQTFDA TAVGPILPIT HERTGFEGII GYETHFSGHG 

101 HEVHSPFDNH DSKSTSDFSG GVDGGFTVYQ LHRTGSEIHP EDGYDGPQGS 

151 DYPPPGGARD IYXXYVKGTS TKTKSNIVPR APFSDRWLKE NAGAASGFFS 

201 RADEAGKLIW ESDPNKNWWA NRMDDIRGIV QGAVNPFLMG FQGVGIGAIT 

251 DSAVSPVTDT AAQQTLQGXN HLGXLSPEAQ LAAATALQDS AFAVKDGINS 

301 ARQWADAHPN ITATAQTALA VAXAATTVWG GKKVELNPTK WDWVKNTGYX 

351 TPAVRTMHTL DGEMAGGNRP PKSITSNSKA DASTQPSLQA QLIGEQIXXG 

401 HAYNKHVIRQ QEFTDLNINS PADFARHIEN IVSHPXNMKE LPRGRTAYWD 

451 XKTGTIVIRD KNSDDGGTAF RPTSGKKYYD DL* 

ORF29a and ORF29-1 show 90.1% identity in 385 aa overlap: 



10 20 30 40 50 60 

or f 2 9a. pep MNXPIQKFMMLFAAAISXLQI PI SHANGLDARLRDDMQAKHYEPGGKYHLFGNARGSVKN 

|| M I I I I I I I I I I I I I I I I i 1 M I i I I i I 1 I 1 I I I I I I I I I I I i I I 1 M I I I I I I I : 
orf29-l MNL PI QKFMMLFAAAI S LLQI PI SHANGLDARLRDDMQAKHYE PGGKYHLFGNARGS VKK 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 2 9a. pep RVYAVQTFDATAVGPILPITHERTGFEGIIGYETHFSGHGHEVHSPFDNHDSKSTSDFSG 
I | | | | | | | | | I I I : I : I I I I I I I I I I I I : I I I I I I I I I I II I i I I 1 I I : I I I I I I I I I I I 
orf29-l RVYAVQTFDATAVSPVLPITHERTGFEGVIGYETHFSGHGHEVHSPFDHHDSKSTSDFSG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf29a.pep GVDGGFTVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDIYXXYVKGTSTKTKSNIVPR 
MMMMIIIMMIIMIMIIMMMMIIIIIIilll I I I I I I I I II : I I I I : 
orf29-l GVDGGFTVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDIYSYYVKGTSTKTKTNIVPQ 

130 140 150 160 170 180 

190 200 210 220 230 240 

or f 2 9a . pep APFSDRWLKENAGAASGFFSRADEAGKLIWESDPNKNWWANRMDDIRGIVQGAVNPFLMG 
I | | I ! 1 I I t I I I I I I I I I I I I I I I I I I t I I I M I I I I 1 i I I I I I 1 : I M I I I I I I II II I 
orf29-l AP FS DRWLKENAGAASG F FS RADEAGKL I WE S DPNKNWWANRMD DVRGI VQGAVN PFLMG 

190 200 210 220 230 240 

250 260 270 280 290 300 

or f 2 9a . pep FQGVG I GAITDSAVSPVTDTAAQQTLQGXNHLGXLSPEAQ LAAATALQDS AFAVKDGINS 

I I I I I 1 I I M I I I I II I I I I I I I II I I I I II Mimiill: IMMillllllM 
orf29-l FQGVGIGAIT DSAVSPVTDTAAQQTLQGINDLGKLSPEAQLAAASLLQDSAFAVKDGINS 

250 260 270 280 290 300 

310 320 330 340 350 360 

or f 2 9a . pep ARQWADAH PN I TATAQT ALAVAXAATT VWGGKKVE LN PTKWDWVKNTG YXT PAVRTMHT L 

I : M | I I I ! I I I I I I I I I I : : I I I I II I I I I I I I I I I I I II I I 1 I I I I : I I : I I 
orf29-l AKQWADAHPNITATAQTALSAAEAAGTVWRGKKVELNPTICWDWVKNTGYKKPAARHMQTL 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf29a.pep DGEMAGGNRPPKSITSNSKADASTQPSLQAQLIGEQIXXGHAYNKHVIRQQEFTDLNINS 

11111111:111: III: I 
orf29-l DGEMAGGNKPIKSLP-NSAAEKRKQNFEKFNSNWSSASFDSVHKTLTPNAPGILSPDKVK 

370 380 390 400 410 
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Homology with a predicted ORF from N. gonorrhoeae 

ORF29 shows 88.8% identity over a 125aa overlap with a predicted ORF (ORF29.ng) from # 



gonorrhoeae: 

or f 2 9. pep 

orf29ng 

orf29.pep 

orf29ng 

or f 2 9. pep 

orf29ng 



VS PVLPITHERTGFEGVIGYETHFSGHGHE 3 0 
I : I: I I I I I I I I I I I I I I I I I I I I I I I I I I 

EPGGKYHLFGNARGSVKNRVCAVQTFDATAVGPILPITHERTGFEGVIGYETHFSGHGHE 102 

VHSPFDHHDSKSTSDFSGGVDGGFTVYQLHRTWSEIHPEDEYDGPQAAXYPPPGGARDIY 90 

I I I I I I : I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Mill:: I I I I I I I I I I I 

VHSPFDNHDSKSTSDFSGGVDGGFTVYQLHRTGSEIHPEDGYDGPQGGGYPPPGGARDIY 162 

SYYVKGTSTKTKTSIVPQAPFSDRWLEENAGAASG 125 
||::| III I 111 : I i M ! I I I I I I : I I I I I I I I 

SYHIKGTSTKTKINTVPQAPFSDRWLKENAGAASGFLSRADEAGKLIWENDPDKNWRANR 222 



The complete length ORF29ng nucleotide sequence <SEQ ID 169> is predicted to encode a protein 



having amino acid sequence <SEQ ID 170>: 

1 MNLPIQKFMM LFAAAISLLQ IPISHAN GLP 



51 FGNARGSVKN 

101 HEVHSPFDNH 

151 GYPPPGGARD 

201 RADEAGKLIW 

251 DSAVSPVTYA 

301 ARQWADAHPN 

351 KPAARHMQTV 

401 YHGFPQSVDA 

451 DGKINHRLFV 



RVCAVQTFDA 
DSKSTSDFSG 
IYSYHIKGTS 
ENDPDKNWRA 
AARKTLQGIH 
ITATAQTALA 
DGEMAGGNKP 
FSENGTVIQI 
PNQQLPEK* 



TAVGPILPIT 
GVDGGFTVYQ 
TKTKINTVPQ 
NRMDDIRGIV 
NLGNLSPEAQ 
VTEAATTVWG 
LESKNTVTTN 
VGGDNIVRHK 



ARLRDDMQAK 
HERTGFEGVI 
LHRTGSEIHP 
APFSDRWLKE 
QGAVNPFLTG 
LAAATALQDS 
GKKVELNPAK 
NFFENTGYTE 
LYIPGSYKGK 



HYEPGGKYHL 
GYETHFSGHG 
EDGYDGPQGG 
NAGAASGFLS 
FQGLGVGAIT 
AFAVKDSINS 
WDWVKNTGYK 
KVLRQASNGD 
DGNFEYIREA 



In a second experiment, the following DNA sequence <SEQ ID 171> was identified: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 



atgAATTTGC 
gatgctGCat 
GCGATGATAT 
TTTGGTAATG 
ATTTGATGCA 
CAGGATTTGA 
CACGAAGTAC 
TTTCAGCGGC 
CAGGGTCGGA 
GGTTATCCGG 
AGGAACTTCA 
CAGACCGCTG 
CGTGCGGATG 
TTGGCGGGCT 
TTAATCCTTT 
GACAGTGCGG 
AGGTATTAAT 
CGAGCCTATT 
GCCAGACAAT 
TGCCCTTGCC 
TAGAACTTAA 
AAACCTGCTG 
GAATAGACCG 
CCTATCCTAA 
GCGGCTCAAG 
TTTTCCAATA 
TTTGGGTTGG 
AGAGATGGCA 
TGCAACTACA 
ATGAAAAAAG 



CTATTCAAAA 
ATCCCCATTA 
GCAGGCAAAA 
CTCGCGGCAG 
ACTGCGGTCG 
AGGTGTTATC 
ACAGTCCGTT 
GGCGTAGACG 
AATACATCCC 
AACCACAAGG 
ACCAAAACAA 
GCTAAAAGAA 
AAGCAGGAAA 
AACCGTATGG 
TTTAACGGGT 
TAAGCCCGGT 
GATTTAGGAA 
ACAGGACAGT 
GGGCTGATGC 
GTAGCAGAGG 
CCCGACCAAA 
CCCGCCATAT 
CCTAAATCTA 
GTTGGTTAAT 
ATCCAAGATT 
GGAACTGCAA 
TGAGGGTGCA 
CTCGACAATA 
GGTATTCAAG 
AAATAAAATT 



ATTCATGATG 
GTCATGCGAA 
CACTACGAAC 
TGTTAAAAAT 
GCCCCATACT 
GGCTATGAAA 
CGATAATCAT 
GCGGTTTTAC 
GCAGACGGAT 
GGCAAGGGAT 
AGATAAACAC 
AATGCCGGTG 
ACTGATATGG 
ATGATATTCG 
TTTCAAGGGG 
CACAGATACA 
ATTTAAGTCC 
GCCTTTGCGG 
CCATCCGAAT 
CCGCAGGTAC 
TGGGATTGGG 
GCAGACTGTA 
TAACGTCGGA 
CAGCTAAATG 
GAGTCTAGCT 
CTTATGAAGA 
AGACAAACTA 
TCGGCCACCA 
CAAATTTTGA 
AAAAATGGAC 



ctgttggcAg 
CGGTTTGGAT 
CGGGTGGCAA 
CGGGTTTGCG 
GCCTATTACA 
CCCATTTTTC 
GATTCAAAAA 
CGTTTACCAA 
ATGACGGGCC 
ATATACAGCT 
TGTTCCGCAA 
CCGCTTCCGG 
GAAAACGACC 
CGGCATCGTC 
TAGGGATTGG 
GCCGCTCAGC 
GGAAGCACAA 
TAAAAGACGG 
ATAACAGCAA 
GGTTTGGCGC 
TTAAAAATAC 
GATGGGGAGA 
AGGAAAAGCT 
AGCAAAACTT 
ATTCATGAGG 
GGCAGATAGA 
GTGGAGGCGG 
ACAGAAAAAA 
AACTTATACT 
ATTTAAATAT 



cggcaatatc 
GCCCGTTTGC 
ATACCATCTG 
CCGTCCAAAC 
CACGAACGGA 
AGGACACGGA 
GCACTTCTGA 
CTTCATCGGA 
TCAAGGCGGC 
ACCATATCAA 
GCCCCTTTTT 
TTTTCTCAGC 
CCGATAAAAA 
CAAGGTGCGG 
GGCAATTACA 
AGACTCTACA 
CTTGCCGCCG 
CATCAATTCC 
CAGCCCAAAC 
GGTAAAAAAG 
CGGCTATAAA 
TGGCAGGGGG 
AATGCTGCAA 
AAATAACATT 
GTAAAAAAAA 
CTAGGTAAAA 
ATGGTTAAGT 
AATCACAATT 
ATTGATTCAA 
TAGGTAA 



This encodes a protein having amino acid sequence <SEQ ID 172; ORF29ng-l>: 



1 MNLPIQKFMM LLAAAISMLH IPISHAN GLP ARLRDDMQAK HYEPGGKYHL 
51 FGNARGSVKN RVCAVQTFDA TAVGPILPIT HERTGFEGVI GYETHFSGHG 
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101 HEVHSPFDNH DSKSTSDFSG GVDGGFTVYQ LHRTGSEIHP ADGYDGPQGG 

151 GYPEPQGARD IYSYHIKGTS TKTKINTVPQ APFSDRWLKE NAGAASGFLS 

201 RADEAGKLIW ENDPDKNWRA NRMDDIRGIV QGAVNPFLTG FQGVGIGAIT 

251 DSAVSPVTDT AAQQTLQGIN DLGNLSPEAQ LAAASLLQDS AFAVKDGINS 

301 ARQWADAHPN ITATAQTALA VAEAAGTVWR GKKVELNPTK WDWVKNTGYK 

351 KPAARHMQTV DGEMAGGNRP PKSITSEGKA NAATYPKLVN QLNEQNLNNI 

401 AAQDPRLSLA IHEGKKNFPI GTATYEEADR LGKIWVGEGA RQTSGGGWLS 

451 RDGTRQYRPP TEKKSQFATT GIQANFETYT IDSNEKRNKI KNGHLNIR* 



ORF29ng-l and ORF29-1 show 86.0% identity in 401 aa overlap: 



10 10 20 30 40 50 60 

orf29ng-l.pep MNLPIQKFMMLLAAAISMLHIPISHANGLDARLRDDMQAKHYEPGGKYHLFGNARGSVKN 
II I I I I I I I I I : I I I I I : I : I I I I I I I I I I I I I I I I I I I I t I I I I I I I I II I I I I II I I : 
orf29-l MNLPIQKFMMLFAAAISLLQIPISHANGLDARLRDDMQAKHYEPGGKYHLFGNARGSVKK 

10 20 30 40 50 60 

15 

70 80 90 100 110 120 

orf29ng-l.pep RVCAVQTFDATAVGPILPITHERTGFEGVIGYETHFSGHGHEVHSPFDNHDSKSTSDFSG 
II I I I II I I I I I : I : I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I : I I I I I I I I II I 
orf29-l RVYAVQTFDATAVSPVLPITHERTGFEGVIGYETHFSGHGHEVHSPFDHHDSKSTSDFSG 
20 70 80 90 100 110 120 



130 140 150 160 170 180 

orf29ng-l.pep GVDGGFTVYQLHRTGSEIHPADGYDGPQGGGYPEPQGARDIYSYHIKGTSTKTKINTVPQ 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I : II I I I I I I I I I : : I I II II I I I Ml 
25 orf 29-1 GVDGGFTVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDIYSYYVKGTSTKTKTNIVPQ 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 29ng-l . pep APFSDRWLKENAGAASGFLSRADEAGKLIWENDPDKNWRANRMDDIRGIVQGAVNPFLTG 
30 II I I I I I I I I I I I I II I I : I I I II I I I I I I I : I I : I I I I I I 1 I I : I I I I I I I I I I I I I 

orf 2 9-1 APFSDRWLKENAGAASGFFSRADEAGKLIWESDPNKNWWANRMDDVRGIVQGAVNPFLMG 

190 200 210 220 230 240 



250 260 270 280 290 300 

35 orf 29ng-l . pep FQGVG I GAITDSAVSPVTDTAAQQTLQGINDLGNLSPEAQLAAASLLQDS AFAVKDGINS 

I I I I II I I I I I I I I I I I I II I I I I I I I II I I I I : I I I I II I I I I I I I I I I I II 1 I I I I I I 
orf 29-1 FQGVGIGAITDSAVSPVTDTAAQQTLQGINDLGKLSPEAQLAAASLLQDSAFAVKDGINS 

250 260 270 280 290 300 

310 320 330 340 350 360 

or f 2 9ng- 1 . pep ARQWADAH PN ITATAQTALAVAEAAGTVWRGKKVE LN PTKWDWVKNTGYKKPAARHMQTV 
I : I I I I I I I I I I I I I I I II :: I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M II I I I I : 
orf 2 9-1 AKQWADAHPN I TATAQTALSAAE AAGTVWRGKKVE LN PTKWDWVKNTGYKK PAARHMQTL 

310 320 330 340 350 360 

370 380 390 400 410 419 

orf29ng-l.pep DGEMAGGNRPPKSI-TSEGKANAATYPKLVNQLNEQNLNNIAAQDPRLSLAIHEGKKNFP 

I I I I I I I I : I I I : : I : : : : I : : : : : : : : : 
orf 2 9-1 DGEMAGGNKPIKSLPNSAAEKRKQNFEKFNSNWSSASFDSVHKTLTPNAPGILSPDKVKT 

370 380 390 400 410 420 

420 430 440 450 460 470 479 

orf 29ng-l . pep IGTATYEEADRLGKIWVGEGARQTSGGGWLSRDGTRQYRPPTEKKSQFATTGIQANFETY 

55 orf 29-1 RYTSLDGKITIIKDNENNYFRIHDNSRKQYLDSNGNAVKTGNLQGKQAKDYLQQQTHIRN 

430 440 450 460 470 480 



Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, 
60 could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



40 
45 
50 



Example 21 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 173>: 
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1 ATGAAAAAAC AAATCACCGC AGCCGTAATG ATGCTGTCTA TGATTGCCCC 
51 CGCAATGGCA AACGGCTTGG ACAATCAGGC ATTTGAAGAC CAAATGTTCC 
101 ACACGCGGGC AGATGCACCG ATGCAG. . . 

This corresponds to the amino acid sequence <SEQ ID 174; ORF30>: 



1 MKKQITAAVM MLSMIAPAMA NGLDNQAFED QMFHTRADAP MQ. . 

Further work revealed the complete nucleotide sequence <SEQ ID 175>: 

1 ATGAAAAAAC AAATCACCGC AGCCGTAATG ATGCTGTCTA TGATTGCCCC 

51 CGCAATGGCA AACGGCTTGG ACAATCAGGC ATTTGAAGAC CAAGTGTTCC 

101 ACACGCGGGC AGATGCACCG ATGCAGTTGG CGGAGCTTTC TCAAAAGGAG 

151 ATGAAGGAGA CAGAGGGGGC GTTTCTTCCA TTGGCTATCT TGGGTGGTGC 

201 TGCCATTGGT ATGTGGACAC AGCATGGTTT TAGTTATGCA ACGACAGGCA 

251 GACCAGCTTC TGTTAGAGAT GTTGCTATTG CTGGCGGATT AGGCGCAATT 

301 CCTGGTGGTG TAGGCGCCGC AGGAAAGGTT GTTTCCTTTG CTAAATATGG 

351 ACGTGAGATT AAAATCGGCA ATAATATGCG GATAGCCCCT TTCGGTAATA 

401 GAACAGGTCA TCCTATTGGA AAATTTCCCC ATTATCATCG TCGAGTTACG 

451 GATAATACGG GCAAGACTTT GCCTGGACAG GGAATTGGTC GTCATCGCCC 

501 TTGGGAATCA AAATCTACGG ACAGATCATG GAAAAACCGC TTCTAA 

This corresponds to the amino acid sequence <SEQ ID 176; ORF30-1>: 



1 MKKQITAAVM MLSMIAPAMA NGLDNQAFED QVFHTRADAP MQLAELSQKE 

51 MKETE GAFLP LAILGGAAIG MW TQHGFSYA TTGRPASVRD VAIAGGLGAI 

101 PGGVGAAGKV VSFAKYGREI KIGNNMRIAP FGNRTGHPIG KFPHYHRRVT 

151 DNTGKTLPGQ GIGRHRPWES KSTDRSWKNR F* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF30 shows 97.6% identity over a 42aa overlap with an ORF (ORF30a) from strain A of N. 
meningitidis: 

10 20 30 40 

orf 30 . pep MKKQITAAVMMLSMIAPAMA NGLDNQAFEDQMFHTRADAPMQ 
I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I II 
orf 30a MKKQITAAVMMLSMIAPAMA NGLDNQAFEDQVFHTRADAPMQLAELSQKEMKXTX GAFLP 

10 20 30 40 50 60 



o r f 3 0 a LXILGGAAIGMWTQHGFS YATTGR P AS VRDVAI AGGLGAI PGXVGAAGKWS FAKYGRE I 

70 80 90 100 110 120 

The complete length ORF30a nucleotide sequence <SEQ ID 177> is: 



1 ATGAAAAAAC AAATCACCGC AGCCGTAATG ATGCTGTCTA TGATTGCCCC 

51 CGCAATGGCA AACGGCTTGG ACAATCAGGC ATTTGAAGAC CAAGTGTTCC 

101 ACACGCGGGC AGATGCACCG ATGCAGTTGG CGGAGCTTTC TCAAAAGGAG 

151 ATGAAGGANA CAGNGGGGGC GTTTCTTCCA TTGGNTATCT TGGGTGGTGC 

201 TGCCATTGGT ATGTGGACAC AGCATGGTTT TAGTTATGCA ACGACAGGCA 

251 GACCAGCTTC TGTTAGAGAT GTTGCTATTG CTGGCGGATT AGGCGCAATT 

301 CCTGGTGNTG TAGGCGCCGC AGGAAAGGTT GTTTCCTTTG CTAAATATGG 

351 ACGTGAGATT AAAATCGGCA ATAATATGCG GATAGCCCCT TTCGGTAATA 

401 GAACAGGTCA TCCTATTGGN AAATTTCCCC ATTATCATCG TCGAGTTACG 

451 GATAATACGG GCAAGACTTT GCCTGGACAG GGAATTGGTC GTCATCGCCC 

501 TTGGGAATCA AAATCTACGG ACAGATCATG GAAAAACCGC TTCTAA 

This encodes a protein having amino acid sequence <SEQ ID 178>: 



1 MKKQITAAVM MLSMIAPAMA ' NGLDNQAFED QVFHTRADAP MQLAELSQKE 

51 MKXTX GAFLP LXILGGAAIG MW TQHGFSYA TTGRPASVRD VAIAGGLGAI 

101 PGXVGAAGKV VSFAKYGREI KIGNNMRIAP FGNRTGHPIG KFPHYHRRVT 

151 DNTGKTLPGQ GIGRHRPWES KSTDRSWKNR F* 

ORF30a and ORF30-1 show 97.8% identity in 181 aa overlap: 



or f 30a . pep MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKXTXGAFLP 60 
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orf30-l 

orf30a.pep 

orf30-l 

orf30a.pep 

orf30-l 

orf 30a. pep 

orf30-l 



I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Mill 
MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 



60 



LXILGGAAIGMWTQHGFSYATTGRPASVRDVAIAGGLGAIPGXVGAAGKWSFAKYGREI 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I 
LAILGGAAIGMWTQHGFSYATTGRPASVRDVAIAGGLGAIPGGVGAAGKWSFAKYGREI 120 

KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 180 

FX 
I I 
FX 



Homology with a predicted QRF from K gonorrhoeae 

ORF30 shows 97.6% identity over a 42aa overlap with a predicted ORF (ORF30.ng) from K 
gonorrhoeae: 

or f 30 . pep MKKQITAAVMMLSMIAPAMANGLDNQAFEDQMFHTRADAPMQ 42 

I I I I I ! I I I I I I I I I I I I I I I I I I II I II II : I I I I I I I I I I 
Orf30ng MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 60 

The complete length ORF30ng nucleotide sequence <SEQ ID 179> is 

1 ATGAAAAAAC AAATCACCGC AGCCGTAATG ATGCTGTCTA TGATCGCCCC 

51 CGCAATGGCA AACGGATTGG ACAATCAGGC ATTTGAAGAC CAAGTGTTCC 

101 ACACGCGGGC AGATGCGCCG ATGCAGTTGG CGGAGCTTTC TCAGAAGGAG 

151 ATGAAGGAGA CTGAAGGGGC TTTTCTTCCA TTGGCTATCT TGGGTGGTGC 

201 TGCCATTGGT ATGTGGACAC AGCATGGTTT TAGTTATGCA ACGACAGGCA 

251 GACCAGCTTC TGTTAGAGAT GTTGCTGGCG GATTAGGCGC AATTCCTGGT 

301 GATGTAGGTG CTGCAGGAAA GGTTGTTTCC TTTGCTAAAT ATGGACGTGA 

351 GATTAAAATC GGCAATAATA TGCGGATAGC CCCTTTCGGT AATAGAACAG 

401 GTCATCCTAT TGGAAAATTT CCCCATTATC ATCGTCGAGT TACGGATAAT 

451 ACGGGCAAGA CTTTGCCTGG ACAGGGAATT GGTCGTCATC GCCCTTGGGA 

501 ATCAAAATCT ACGGACAGAT CATGGAAAAA CCGCTTCTAA 

This encodes a protein having amino acid sequence <SEQ ID 180>: 

1 MKKQITAAVM MLSMIAPAM A NGLDNQAFED QVFHTRADAP MQLAELSQKE 

51 MKETEGAFLP LAILGGAAIG MWTQHGFSYA TTGRPASVRD VAGGLGAIPG 

101 DVGAAGKWS FAKYGREIKI GNNMRIAPFG NRTGHPIGKF PHYHRRVTDN 

151 TGKTLPGQGI GRHRPWESKS TDRSWKNRF* 

ORF30ng and ORF30-1 show 98.3% identity in 181 aa overlap: 



10 20 30 40 50 60 

orf 30ng . pep MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 
I I I I I I I I I I I I I I I I I M II I M I I I I I I M I I I I M I I I I II I I I i I I I I I I I I M I I 
orf 30-1 MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 

10 20 30 40 50 60 



70 80 90 100 110 

orf30ng.pep LAILGGAAIGMWTQHGFSYATTGRPASVRDVA — GGLGAIPG DVGAAGKWS FAKYGRE I 
I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I II II I I I I I I I I 
orf 30-1 LAILGGAAIGMWTQHGFSYATTGRPASVRDVAIAGGLGAIPGGVGAAGKWSFAKYGREI 

70 80 90 100 110 120 



120 130 140 150 160 170 

or f 30ng . pep KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWBCNR 
I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I II I I I I I 
orf 30-1 KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 

130 140 150 160 170 180 



180 

orf30ng.pep FX 
I I 

orf30-l FX 
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Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from N .meningitidis and N.gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 22 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 181>: 

1 ATGAATAAAA CTCTCTATCG TGTAATTTTC AACCGCAAAC GTGGGGCTGT 

51 GrTAGCCGTT GCTGAAACTA CCAAGCGCGA AGGTAAAAGC TGTGCCGATA 

101 GTGATTCAGG CAGCGCTCAT GTGAAATCTG TTCCTTTTGG TACTACTCAT 

151 GCACCTGTTT GTg.CGTTaC AAATATCTTT TCTTTTTCTT TATTGGGCTT 

201 TTCTTTATGT TTGGCTGTAG GtacGGyCAA TATTGCTTTT GCTGATGGCA 

251 TT.. 

This corresponds to the amino acid sequence <SEQ ID 182; ORF31>: 

1 MNKTLYRVIF NRKRGAVXAV AETTKREGKS CADSDSGSAH VKSVPFGTTH 
51 APVCXVTNIF SFSLLGFSLC LAVGTXNIAF ADGI . . 

Further work revealed a further partial nucleotide sequence <SEQ ID 1 83>: 

1 ATGAATAAAA CTCTCTATCG TGTAATTTTC AACCGCAAAC GTGGGGCTGT 

51 GGTAGCCGTT GCTGAAACTA CCAAGCGCGA AGGTAAAAGC TGTGCCGATA 

101 GTGATTCAGG CAGCGCTCAT GTGAAATCTG TTCCTTTTGG TACTACTCAT 

151 GCACCTGTTT GTCGTTCAAA TATCTTTTCT TTTTCTTTAT TGGGCTTTTC 

201 TTTATGTTTG GCTGTAGGTA CGGCCAATAT TGCTTTTGCT GATGGCATT. . 

This corresponds to the amino acid sequence <SEQ ID 184; ORF31-l>: 

1 MNKTLYRVIF NRKRGAWAV AETTKREGKS CADSDSGSAH VKSVPFGTTH 
51 APVCRSNIFS FSLLGFSLCL AVGTANIAFA DGI. . 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.zonorrhoeae 

ORF31 shows 762% identity over a 84aa overlap with a predicted ORF (ORF3Lng) from N. 
gonorrhoeae: 

orf 31 . pep MNKTLYRVIFNRKRGAVXAVAETTKREGKSCADSDSGSAHVKSVPFGTTHAPVCXVTNIF 60 

I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : : I I I I I II :: I 

orf31ng MNKTLYRVIFNRKRGAWAVAETTKREGKSCADSGSGSVYVKSVSFIPTH SKAF 54 

orf 31 .pep SFSLLGFSLCLAVGTXNIAFADGI 84 

II Mlilllhll II Ml III 
orf31ng CFSALGFSLCLALGTVNIAFADGIITDKAAPKTQQATILQTGNGIPQVNIQTPTSAGVSV 114 

The complete length ORF31ng nucleotide sequence <SEQ ID 185> is: 

1 ATGAACAAAA CCCTCTATCG TGTGATTTTC AACCGCAAAC GCGGTGCTGT 

51 GGTAGCTGTT GCCGAAACCA CCAAGCGCGA AGGTAAAAGC TGTGCCGATA 

101 GTGGTTCGGG CAGCGTTTAT GTGAAATCCG TTTCTTTCAT TCCTACTCAT 

151 TCCAAAGCCT TTTGTTTTTC TGCATTAGGC TTTTCTTTAT GTTTGGCTTT 

201 GGGTACGGTC AATATTGCTT TTGCTGACGG CATTATTACT GATAAAGCTG 

251 CTCCTAAAAC CCAACAAGCC ACGATTCTGC AAACAGGTaa cGGCATACCG 

301 CAAGTCAATA TTCAAACCCC TACTTCGGCA GGGGTTTCTG TTAATCAATA 

351 TGCCCAGTTT GATGTGGGTA ATCGCGGGGC GATTTTAAAC AACAGTCGCA 

4 01 GCAACACCCA AACACAGCTA GGCGGTTGGA TTCAAGGCAA TCCTTGGTTG 

451 ACAAGGGGCG AAGCACGTGT GGTTGTAAAC CAAATCAACA GCAGCCATCC 

501 TTCACAACTG AATGGCTATA TTGAAGTGGG TGGACGACGT GCAGAAGTCG 

551 TTATTGCCAA TCCGGCAGGG ATTGCAGTCA ATGGTGGTGG TTTTATCAAT 

601 GCTTCCCGTG CCACTTTGAC GACAGGCCAA CCGCAATATC AAGCAGGAGA 

651 CTTTAGCGGC TTTAAGATAA GGCAAGGCAA TGCTGTAATC GCCGGACACG 
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701 GTTTGGATGC CCGTGATACC GATTTCACAC GTATTCTTGT ATGCCAACAA 
751 AATCACCTTG ATCAGTACGG CCGAACAAGC AGGCATTCGT AA 

This encodes a protein having amino acid sequence <SEQ ID 186>: 

1 MNKTLYRVIF NRKRGAWAV AETTKREGKS CADSGSGSVY VKSVSFIPTH 
5 51 SKAFCFSALG FSLCLALGTV NIAFADGIIT DKAAPKTQQA TILQTGNGIP 

101 QVNIQTPTSA GVSVNQYAQF DVGNRGAILN NSRSNTQTQL GGWIQGNPWL 
151 TRGEARVWN QINSSHPSQL NGYIEVGGRR AEWIANPAG IAVNGGGFIN 
201 ASRATLTTGQ PQYQAGDFSG FKIRQGNAVI AGHGLDARDT DFTRILVCQQ 
251 NHLDQYGRTS RHS* 

10 This gonococcal protein shares 50% identity over a 149aa overlap with the pore-forming 
hemolysins-like HecA protein from Erwinia chrysanthemi (accession number L39897): 



15 



orf31ng 96 GNGIPQWIQTPTSAGVSVNQYAQFDVGNRGAILNNSRSN-TQTQLGGWIQGNPWLTRGE 154 

GNG+.P VNI TP ++G+S N+Y F+V NRG ILNN + T +QLGG IQ NP L 
HecA 45 GNGVPWNIATPDASGLSHNRYHDFNVDNRGLILNNGTARLTPSQLGGLIQNNPNLNGRA 104 

Orf31ng 155 ARVWNQINSSHPSQLNGYIEVGGRRAEWIANPAGIAVNGGGFINASRATLTTGQPQYQ 214 

A ++N++ S + S+L GY+EV G+ A W+ANP GI +G GF+N R TLTTG PQ+ 
HecA 105 AAAI LNE WS PNRSRLAG YLEVAGQAAN WVAN PYG I TCSGCGFLNTPRLTLTTGTPQFD 164 



20 Orf31ng 215 -AGDFSG FKIRQGNAVI AGHGLDARDT DF 242 

AG SG +R G+ +1 G GLDA +D+ 
HecA 165 AAGG L S G L D VRGG DILI DG AG LD AS R S D Y 193 

Furthermore, ORF31ng and ORF31-1 show 79.5% identity in 83 aa overlap: 

10 20 30 40 50 60 

25 orf 31-1. pep MNKTLYRVIFNRKRGAWAVAETTKREGKSCADSDSGSAHVKSVPFGTTHAPVCRSNIFS 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I :: I I I I I II I : I 
orf31ng MNKTLYRVIFNRKRGAWAVAETTKREGKSCADSGSGSVYVKSVSFIPTH SKAFC 

10 20 30 40 50 

30 70 80 

orf 31-1 .pep FSLLGFSLCLAVGTANIAFADGI 

II I I I I I i I I : I I : I I I I I I I I 

0rf31ng FSALGFSLCLALGTVNIAFADGIITDKAAPKTQQATILQTGNGIPQVNIQTPTSAGVSVN 
60 70 80 90 100 110 

35 On this basis, including the homology with hemolysins, and also with adhesins, it is predicted that 
the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens 
for vaccines or diagnostics, or for raising antibodies. 



Example 23 

The following partial DNA sequence was identified in N .meningitidis <SEQ ID 187>: 

40 1 ATGAATACTC CTCCTTTTGT CTGTTGGATT TTTTGCAAGG TCATCGACAA 

51 TTTCGGCGAC ATCGGCGTTT CGTGGCGGCT CGCCCGTGTT TTGCACCGCG 

101 AACTCGGTTG GCAGGTGCAT TTGTGGACGG ACGATGTGTC CGCCTTGCGT 

151 GCGCTTTGCC CTGATTTGCC CGATGTTCCC TGCGTTCATC AGGATATTCA 

201 TGTCCGCACT TGGCATTCCG ATGCGGCAGA TATTGATACC GCG. . 

45 This corresponds to the amino acid sequence <SEQ ID 188; ORF32>: 

1 MNTPPFVCWI FCKVIDNFGD IGVSWRLARV LHRELGWQVH LWTDDVSALR 
51 ALCPDLPDVP CVHQDIHVRT WHSDAADIDT A. . 

Further work revealed the complete nucleotide sequence <SEQ ID 189>: 

1 ATGAATACTC CTCCTTTTGT CTGTTGGATT TTTTGCAAGG TCATCGACAA 
50 51 TTTCGGCGAC ATCGGCGTTT CGTGGCGGCT CGCCCGTGTT TTGCACCGCG 

101 AACTCGGTTG GCAGGTGCAT TTGTGGACGG ACGATGTGTC CGCCTTGCGT 
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151 GCGCTTTGCC CTGATTTGCC CGATGTTCCC TGCGTTCATC AGGATATTCA 

201 TGTCCGCACT TGGCATTCCG ATGCGGCAGA TATTGATACC GCGCCTGTTC 

251 CCGATGTCGT CATCGAAACT TTTGCCTGCG ACCTGCCCGA AAATGTGCTG 

301 CACATTATCC GCCGACACAA GCCGCTTTGG CTGAATTGGG AATATTTGAG 

351 CGCGGAGGAA AGCAATGAAA GGCTGCATCT GATGCCTTCG CCGCAGGAGG 

401 GTGTTCAAAA ATATTTTTGG TTTATGGGTT TCAGCGAAAA AAGCGGCGGG 

451 TTGATACGCG AACGTGATTA CTGCGAAGCC GTCCGTTTCG ATACTGAAGC 

501 CCTGCGAGAG CGGCTGATGC TGCCCGAAAA AAACGCCTCC GAATGGCTGC 

551 TTTTCGGCTA TCGGAGCGAT GTTTGGGCAA AGTGGCTGGA AATGTGGCGA 

601 CAGGCAGGCA GCCCGATGAC ACTGTTGCTG GCGGGGACGC AAATCATCGA 

651 CAGCCTCAAA CAAAGCGGCG TTATTCCGCA AGATGCCCTG CAAAACGACG 

701 GCGATGTTTT TCAGACGGCA TCCGTCCGCC TCGTCAAAAT CCCTTTCGTG 

751 CCGCAACAGG ACTTCGACCA ACTGCTGCAC CTTGCCGACT GCGCCGTCAT 

801 CCGCGGCGAA GACAGTTTCG TGCGCGCCCA GCTTGCGGGC AAACCCTTCT 

851 TTTGGCACAT CTACCCGCAA GACGAGAATG TGCATCTCGA CAAACTCCAC 

901 GCCTTTTGGG ATAAGGCACA CGGTTTCTAC ACGCCCGAAA CCGTGTCGGC 

951 ACACCGCCGT CTTTCGGACG ACCTCAACGG CGGAGAGGCT TTATCCGCAA 

1001 CACAACGCCT CGAATGTTGG CAAACCCTGC AACAACATCA AAACGGCTGG 

1051 CGGCAAGGCG CGGAGGATTG GAGCCGTTAT CTTTTCGGGC AGCCGTCAGC 

1101 TCCTGAAAAA CTCGCTGCCT TTGTTTCAAA GCATCAAAAA ATACGCTAG 

This corresponds to the amino acid sequence <SEQ ID 190; ORF32-l>: 

1 MNTPPFVCWI FCKVIDNFGD IGVSWRLARV LHRELGWQVH LWTDDVSALR 

51 ALCPDLPDVP CVHQDIHVRT WHSDAADIDT APVPDWIET FACDLPENVL 

101 HIIRRHKPLW LNWEYLSAEE SNERLHLMPS PQEGVQKYFW FMGFSEKSGG 

151 LIRERDYCEA VRFDTEALRE RLMLPEKNAS EWLLFGYRSD VWAKWLEMWR 

201 QAGSPMTLLL AGTQIIDSLK QSGVIPQDAL QNDGDVFQTA SVRLVKIPFV 

251 PQQDFDQLLH LADCAVIRGE DSFVRAQLAG KPFFWHIYPQ DENVHLDKLH 

301 AFWDKAHGFY TPETVSAHRR LSDDLNGGEA LSATQRLECW QTLQQHQNGW 

351 RQGAEDWSRY LFGQPSAPEK LAAFVSKHQK IR*w 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF32 shows 93.8% identity over a 81aa overlap with an ORF (ORF32a) from strain A ofN. 
meningitidis: 

10 20 30 40 50 60 

orf 32 . pep MNTPPFVCW I FCKVIDNFGD I GVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDVP 

MUM M M M M M M M M M M M M M M M M M M M I I M M M M I 
orf 32a MNTPPFSAGXFCBCVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDVX 

10 20 30 40 50 60 

70 80 
orf 32 . pep CVHQDIHVRTWHSDAADIDTA 

M M M M M M M M M M I 
orf 32a CVHQDIHVRTWHSDAADIDTAPVXDWIETFACDLPENVLHIIRRHKPLWLXWEYLSAEX 

70 80 90 100 110 120 

The complete length ORF32a nucleotide sequence <SEQ ID 191> is: 

1 ATGAATACTC CTCCTTTTTC TGCTGGANTT TTTTGCAAGG TCATCGACAA 

51 TTTCGGCGAC ATCGGCGTTT CGTGGCGGCT TGCCCGTGTT TTGCACCGCG 

101 AACTCGGTTG GCAGGTGCAT TTGTGGACGG ACGATGTGTC CGCCTTGCGT 

151 GCGCTTTGCC CTGATTTGCC CGATGTTCNC TGCGTTCATC AGGATATTCA 

201 TGTCCGCACT TGGCATTCCG ATGCGGCAGA TATTGATACC GCGCCTGTTC 

251 NCGATGTCGT CATCGAAACT TTTGCCTGCG ACCTGCCCGA AAATGTGCTG 

301 CACATCATCC GCCGACACAA GCCGCTTTGG CTGAANTGGG AATATTTGAG 

351 CGCGGAGGAN AGCAATGAAA GGCTGCACNT GATGCCTTCG CCGCAGGAGA 

401 GTGTTCNAAA ATANTTTTGG TTTATGGGTT TCAGCGAANN NAGCGGCGGA 

451 CTGATACGCG AACGCGATTA CTGCGAAGCC GTCCGTTTCG ATAGCGGAGC 

501 CTTGCGCAAG AGGCTGATGC TTCCCGAAAA AAACGNCCCC GAATGGCTGC 

551 TTTTCGGCTA TCGGAGCGAT GTTTGGGCAA AGTGGCTGGA AATGTGGCGA 

601 CAGGCAGGCA GTCCGTTGAC ACTTTTGCTG GCNGGGGCGC ANATTATCGA 

651 CAGCCTCAAA CAAAACGGCG TTATTCCGCA AGATGCCCTG CAAAACGACG 

701 GCGATGTTTT TCAGACGGCA TCCGTCCGCC TCGTCAAAAT CCCTTTCGTG 

751 CCGCAACAGG ACTTCGACAA ACTGCTGCAC CTTGCCGACT GCGCCGTCAT 



WO 99/24578 



-153- 



PCT/IB98/01665 



801 CCGCGGCGAA 

851 TTTGGCACAT 

901 GCCTTTTGGG 

951 ACACCGCCGC 

1001 CACAACGCCT 

1051 CGGCAAGGCG 

1101 ATCCGAAAAA 



GACAGTTTCG 
CTACCCGCAA 
ATAAGGCACA 
CTTTCAGACG 
CGAATGTTGG 
CGGAGGATTG 
CTCGCCGCCT 



TGCGCGCCCA 
GATGAGAATG 
CGGTTTCTAC 
ACCTCAACGG 
CAAATCCTGC 
GAGCCGTTAT 
TTGTTTCAAA 



GCTTGCGGGC 
TCCATCTCGA 
ACGCCCGAAA 
CGGAGAGGCT 
AACAACATCA 
CTTTTTGGGC 
GCATCAAAAA 



AAACCCTTCT 
CAAACTCCAC 
CCGCATCGGC 
TTATCCGCAA 
AAACGGCTGG 
AGCCTTCCGC 
ATACGCTAG 



This encodes a protein having amino acid sequence <SEQ ID 192>: 



10 



15 



1 MNTPPFSAGX 

51 ALCPDLPDVX 

101 HIIRRHKPLW 

151 LIRERDYCEA 

201 QAGSPLTLLL 

251 PQQDFDKLLH 

301 AFWDKAHGFY 

351 RQGAEDWSRY 



FCKVIDNFGD 
CVHQDIHVRT 
LXWEYLSAEX 
VRFDSGALRK 
AGAXIIDSLK 
LADCAVIRGE 
TPETASAHRR 
LFGQPSASEK 



IGVSWRLARV 
WHSDAADIDT 
SNERLHXMPS 
RLMLPEKNXP 
QNGVIPQDAL 
DSFVRAQLAG 
LSDDLNGGEA 
LAAFVSKHQK 



LHRELGWQVH 
APVXDWIET 
PQESVXKXFW 
EWLLFGYRSD 
QNDGDVFQTA 
KPFFWHIYPQ 
LSATQRLECW 
IR* 



LWTDDVSALR 
FACDLPENVL 
FMGFSEXSGG 
VWAKWLEMWR 
SVRLVKIPFV 
DENVHLDKLH 
QILQQHQNGW 



ORF32a and ORF32-1 show 93.2% identity in 382 aa overlap: 



20 



25 



30 



35 



40 



45 



50 



55 



10 20 30 40 50 60 

orf 32-1. pep MNTPPFVCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDVP 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 32a MNTPPFSAGXFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDVX 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 32-1. pep CVHQDIHVRTWHSDAADIDTAPVPDWIETFACDLPENVLHIIRRHKPLWLNWEYLSAEE 

I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 32a CVHQDIHVRTWHSDAADIDTAPVXDWIETFACDLPENVLHIIRRHKPLWLXWEYLSAEX 

70 80 90 100 110 120 

130 140 150 160 170 180. 

orf 32-1 . pep SNERLHLMPSPQEGVQKYFWFMGFSEKSGGLIRERDYCEAVRFDTEALRERLMLPEKNAS 

I I I I I I I I I I I I : I I I I I I I I I I I I I I II I I I I I I I I I ! I : 111:11111111 
orf 32a SNERLHXM P S PQESVXKXFW FMGFSEXSGG LI RE RDYCEAVRFDSGALRKRLMLPEKNXP 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 32-1. pep EWLLFGYRSDVWAKWLEMWRQAGSPMTLLLAGTQIIDSLKQSGVIPQDALQNDGDVFQTA 
i I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I : I II I I I I : I I I I I I I I II I I II I I I I 
orf 32a EWLLFGYRSDVWAKWLEMWRQAGSPLTLLLAGAXIIDSLKQNGVIPQDALQNDGDVFQTA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 32-1 . pep SVRLVKIPFVPQQDFDQLLHLADCAVIRGEDSFVRAQLAGKPFFWHIYPQDENVHLDKLH 
I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 32a SVRLVKIPFVPQQDFDKLLHLADCAVIRGEDSFVRAQLAGKPFFWHIYPQDENVHLDKLH 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 32-1 . pep AFWDKAHGFYTPETVSAHRRLSDDLNGGEALSATQRLECWQTLQQHQNGWRQGAEDWSRY 
I I I I I I I I I I I I I I : II I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I 
orf 32a AFWDKAHGFYTPETASAHRRLSDDLNGGEALSATQRLECWQILQQHQNGWRQGAEDWSRY 

310 320 330 340 350 360 

370 380 
orf 32-1 . pep L FGQ P S APE KLAAFVS KHQK IRX 
I I I I I I I I I I I I I I I I I M I I I 
orf 32a LFGQPSASEKLAAFVSKHQKIRX 

370 380 



60 Homology with a predicted ORF from ^gonorrhoeae 

ORF32 shows 95.1% identity over a 82aa overlap with a predicted ORF (ORF32.ng) from N. 
gonorrhoeae: 

orf 32 .pep MNTPPF-VCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLP 57 

Ml | 1 1 I I I f I I 1 f I I I I 1 t I I 1 I 1 I I I I i t I I I I I I I I 1 1 1 I I t I I t t I I I I 1 t 
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-154- 

orf32ng MVMNTYAFPVCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLP 60 

orf 32 .pep DVPCVHQDIHVRTWHSDAADI DTA 81 

III I I I I I I I I I I I I I I I I I I I I 
orf32ng DVPFVHQDIHVRTWHSDAADIDTAPVPDAVIETFACDLPENVLNIIRRHKPLWLNWEYLS 120 

An ORF32ng nucleotide sequence <SEQ ID 193> was predicted to encode a protein having amino 
acid sequence <SEQ ID 194>: 

1 MVMNTYAFPV CWIFCKVIDN FGDIGVSWRL ARVLHRELGW QVHLWTDDVS 

51 ALRALCPDLP DVPFVHQDIH VRTWHSDAAD IDTAPVPDAV IETFACDLPE 

101 NVLNIIRRHK PLWLNWEYLS AEESNERLHL MPSPQEGVQK YFWFMGFSEK 

151 SGGLIRERDY REAVRFDTEA LRRRLVLPEK NAPEWLLFGY RGDVWAKWLD 

201 MWQQAGSLMT LLLAGAQIID SLKQSGVIPQ NALQNEGGVF QTASVRLVKI 

251 PFVPQQDFDK LLHLADCAVI RGEDSFVRTQ LAGKPFFWHI YPQDENVHLD 

301 KLHAFWDKAY GFYTPETASV HRLLSDDLNG GEALSATQRL ECGVL* 

Further sequencing revealed the following DNA sequence <SEQ ID 195>: 

1 . ATGAATACAT ACGCTTTTCC TGTCTGTTGG ATTTTTTGCA AGGTCATCGA 

51 CAATTTCGGC GACATCGGCG TTTCGTGGCG GCTCGCCCGT GTTTTGCACC 

101 GCGAACTCGG TTGGCAGGTG CATTTGTGGA CGGACGACGT GTCCGCCTTG 

151 CGCGCGCTTT GTCCCGATTT GCCCGATGTT CCCTTCGTTC ATCAGGATAT 

201 TCATGTCCGC ACTTGGCATT CCGATGCGGC AGACATTGAT ACCGCGCCCG 

251 TTCCCGATGC CGTTATCGAA ACTTTTGCCT GCGACCTGCC CGAAAATGTG 

301 CTGAACATCA TCCGCCGACA CAAACCGCTT TGGCTGAATT GGGAATATTT 

351 GAGCGCGGAG GAAAGCAATG AAAGGCTGCA CCTGATGCCT TCGCCGCAGG 

401 AGGGCGTTCA AAAATATTTT TGGTTTATGG GTTTCAGCGA AAAAAGCGGC 

451 GGGTTGATAC GCGAACGCGA TTACCGCGAA GCCGTCCGTT TCGATACCGA 

501 AGCCCTGCGC CGGCGGCTGG TGCTGCCCGA AAAAAACGCC CCCGAATGGC 

551 TGCTTTTCGG CTATCGGGGC GATGTTTGGG CAAAGTGGCT GGACATGTGG 

601 CAACAGGCAG GCAGCCTGAT GACCCTACTG CTGGCGGGGG CGCAAATTAT 

651 CGACAGCCTC AAACAAAGCG GCGTTATTCC GCAAAACGCC CTGCAAAAtg 

701 aaggcgGTGT CTTTCagacG gcatccgTcC gccttGTCAA AAtcCCGTTC 

751 GTGCcGCAAC AGGAcTTCGA CAAATTGCTG CAcctcgcCG ACTGCGCCGT 

801 GATACGCGGC GAAGACAGTT TCGTGCGTAC CCAGCTTGCC GGAAAACCCT 

851 TTTTTTGGCA CATCTACCCG CAAGACGAGA ATGTCCATCT CGACAAACTC 

901 CACGCCTTTT GGGATAAGGC ATACGGCTTC TACACGCCCG AAACCGCATC 

951 GGTGCACCGC CTCCTTTCGG ACGACCTCAA CGGCGGAGAG GCTTTATCCG 

1001 CAACACAACG CCTCGAATGT TGGCAAACCC TGCAACAACA TCAAAACGGC 

1051 TGGCGGCAAG GCGCGGAGGA TTGGAGCCGT TATCTTTTCG GGCAGCCTTC 

1101 CGCATCCGAA AAACTCGCCG CCTTTGTTTC AAAGCATCAA AAAATACGCT 

1151 AG 

This encodes a protein having amino acid sequence <SEQ ID 196; ORF32ng-l>: 



1 MNTYAFPVCW IFCKVIDNFG DIGVSWRLAR VLHRELGWQV HLWTDDVSAL 

51 RALCPDLPDV PFVHQDIHVR TWHSDAADID TAPVPDAVIE TFACDLPENV 

101 LNIIRRHKPL WLNWEYLSAE ESNERLHLMP SPQEGVQKYF WFMGFSEKSG 

151 GLIRERDYRE AVRFDTEALR RRLVLPEKNA PEWLLFGYRG DVWAKWLDMW 

201 QQAGSLMTLL LAGAQIIDSL KQSGVIPQNA LQNEGGVFQT ASVRLVKIPF 

251 VPQQDFDKLL HLADCAVIRG EDSFVRTQLA GKPFFWHIYP QDENVHLDKL 

301 HAFWDKAYGF YTPETASVHR LLSDDLNGGE ALSATQRLEC WQTLQQHQNG 

351 WRQGAEDWSR YLFGQPSASE KLAAFVSKHQ KIR* 

ORF32ng-l and ORF32-1 show 93.5% identity in 383 aa overlap: 



10 20 30 40 50 59 

orf 32-1 . pep MNTPPF-VCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDV 
III I I I I I I I I I I I I I II I I I I I I I I I I I I I I ! I I I I I I I I I I I I II I I I I I I I I I I 
orf32ng-l MNTYAFPVCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDV 

10 20 30 40 50 60 



60 70 80 90 100 110 119 

orf 32-1 . pep PCVHQDI HVRTWHS DAAD I DTAPVPDWIET FAC DL PENVLH I IRRHKPLWLNWE YLS AE 
I M I I I I I I I I I I II I I I I I I I I I I : I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I 
orf32ng-l PFVHQDIHVRTWHSDAADIDTAPVPDAVIETFACDLPENVLNI IRRHKPLWLNWE YLSAE 

70 80 90 100 110 120 



120 130 140 150 160 170 179 
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orf 32-1 . pep ESNERLHLMPSPQEGVQKYFWFMGFSEKSGGLIRERDYCEAVRFDTEALRERLMLPEKNA 
I II I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I i I i I I : I I : I M I I I 
orf32ng-l ESNERLHLMPSPQEGVQKYFWFMGFSEKSGGLIRERDYREAVRFDTEALRRRLVLPEKNA 

130 140 150 160 170 180 

5 

180 190 200 210 220 230 239 

orf 32-1. pep SEWLLFGYRSDVWAKWLEMWRQAGSPMTLLLAGTQIIDSLKQSGVIPQDALQNDGDVFQT 
I I I I I I II : I I I I M I : I I : I I I I I I I I I I I : I I I I I I I I I I I I I I : I I I f: I MM 
orf32ng-l PEWLLFGYRGDVWAKWLDMWQQAGSLMTLLLAGAQIIDSLKQSGVIPQNALQNEGGVFQT 
10 190 200 210 220 230 240 

240 250 260 270 280 290 299 

orf 32-1 . pep ASVRLVKIPFVPQQDFDQLLHLADCAVIRGEDSFVRAQLAGKPFFWHIYPQDENVHLDKL 
I I I I I I I I I I II I I II I : I I I I I I I I I I I I I I I I I I : I I I I I I I II I I I II I I I I I I I I I 
15 orf32ng-l ASVRLVKIPFVPQQDFDKLLHLADCAVIRGEDSFVRTQLAGKPFFWHIYPQDENVHLDKL 

250 260 270 280 290 300 

300 310 320 330 340 350 359 

orf 32-1 . pep HAFWDKAHGFYTPETVSAHRRLSDDLNGGEALSATQRLECWQTLQQHQNGWRQGAEDWSR 
20 " * I I I I I I I : I I I I II I : I : I I I I I M I I I II I I M I I I I I I I I I I I I I I I I I I I I I I I I I 

orf32ng-l HAFWDKAYGFYTPETASVHRLLSDDLNGGEALSATQRLECWQTLQQHQNGWRQGAEDWSR 

310 320 330 340 350 360 

360 370 380 

25 orf 32-1 .pep YLFGQPSAPEKLAAFVSKHQKIRX 

I I I I II I I I I I I I I I I I II I I I I 
orf32ng-l YLFGQPSASEKLAAFVSKHQKIRX 

370 380 

30 On this basis, including the RGD sequence in the gonococcal protein, characteristic of adhesins, 
it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF32-1 (42kDa) was cloned in pET and pGex vectors and expressed in Exoli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
35 7A shows the results of affinity purification of the His-fusion protein, and Figure 7B shows the 
results of expression of the GST-fusion in E.colL Purified His-fusion protein was used to immunise 
mice, whose sera were used for ELISA, giving a positive result. These experiments confirm that 
ORF32-1 is a surface-exposed protein, and that it is a useful immunogen. 

Example 24 

40 The following partial DNA sequence was identified in ^meningitidis <SEQ ID 197>: 

1 . . TTGTTCCTGC GTGTNAAAGT GGGGCGTTTT TTCAGCAGTC CGGCGACGTG 

51 GTTTCGGGNC AAAGACCCTG TAAATCAGGC GGTGTTGCGG CTGTATNCGG 

101 ACGAGTGGCG GCA.ACTTCG GTACGTTGGA AAATAGNCGC AACGTCGCAC 

151 AGCCTGTGGC TCTGCACGCT GCTCGGAATG CTGGTGTCGG TATTGTTGCT 

45 201 GCTTTTGGTG CGGCAATATA CGTTCAACTG GGAAAGCACG CTGTTGAGCA 

251 ATGCCGCTTC GGTACGCGCG GTGGAAATGT TGGCATGGCT GCCGTCGAAA 

301 CTCGGTTTCC CTGTCCCCGA TGCGCGGTCG GTCATCGAAG GCCGTCTGAA 

351 CGGCAATATT GCCGATGCGC GGGCTTGGTC GGGGCTGCTG GTCGNCAGTA 

401 TCGCCTGCTA NGGCATCCTG CCGCGCCTG. . 

50 This corresponds to the amino acid sequence <SEQ ID 198; ORF33>: 

1 . . LFLRVKVGRF FSSPATWFRX KDPVNQAVLR LYXDEWRXTS VRWKIXATSH 
51 SLWLCTLLGM LVSVLLLLLV RQYTFNWEST LLSNAASVRA VEMLAWLPSK 
101 LGFPVPDARS VIEGRLNGNI ADARAWSGLL VXSIACXGIL PRL. . 
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Further work revealed the complete nucleotide sequence <SEQ ID 199>: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 



ATGTTGAATC 
AGGCGGTTTT 
GCCGCGTGGA 
ATTGACAGGA 
GTCGTTCTGG 
TTTCAGTCAC 
GTTTTGGCGG 
GGCAATGTTG 
CGACGTGGTT 
TATGCGGACG 
GTCGCACAGC 
TGTTGCTGCT 
TTGAGCAATG 
GTCGAAACTC 
GTCTGAACGG 
GGCAGTATCG 
GTGTAAAATC 
CCTATTATCA 
GATACGCGTC 
CGATGCGCCG 
AATGGTTCGA 
ACCAATCGGG 
GGCGCAACTG 
TGTTGCGGCA 
GTGCAGCTTT 
GGAACATTGG 
CTGACAGGGC 



CATCCCGAAA 
ATTTTCAGCG 
CGGCAGTACG 
ACCGTATGCT 
TTGTGGGTGG 
TTATCTTCTA 
GCGTGTTGGG 
TTCCTGCGTG 
TCGGGGCAAA 
AGTGGCGGCA 
CTGTGGCTCT 
TTTGGTGCGG 
CCGCTTCGGT 
GGTTTCCCTG 
CAATATTGCC 
CCTGCTACGG 
CTTTTGAAAA 
GGCGGTCATC 
GGGAAACCGT 
AAATGGGCGG 
GGGCAGGCTG 
AACAGGTTGC 
CTTATCGGCG 
GATTGTCCGA 
TGGCGGAACA 
CGTAACGCGC 
GGCGCAGGAA 



ACTGGTTGAG 
GCGATCCCGT 
GAGGAAAAAA 
GCGGGAGACG 
TGGCGGCGAC 
ATGGACAATC 
CATGAATACG 
TGAAAGTGGG 
GACCCTGTAA 
ACCTTCGGTA 
GCACGCTGCT 
CAATATACGT 
ACGCGCGGTG 
TCCCCGATGC 
GATGCGCGGG 
CATCCTGCCG 
CAAGCGAAAA 
CGCCGCTGGC 
GTCCGCCGTT 
TCATGCTGGA 
GCGCAGGAAT 
CGCGCTGGAG 
TGCGCGCCCA 
CTCTCGGAAG 
GGGGCTTTCA 
TGGCCGAATG 
GGGCGTTTGA 



CTGGTCCGTA 
ACAGGCGACG 
TCATCCGTCG 
TTGGAACGTG 
GTTTGCATTT 
AGGGTCTGAA 
CTGATGCTGG 
GCGTTTTTTC 
ATCAGGCGGT 
CGTTGGAAAA 
CGGAATGCTG 
TCAACTGGGA 
GAAATGTTGG 
GCGGGCGGTC 
CTTGGTCGGG 
CGCCTGCTGG 
CGGATTGGAT 
AGAACAAAAT 
TCACCGAAAA 
GACCGAGTGG 
GGCTGGATAA 
ACAGAGCTGA 
AACTGTGCCG 
CGGCGCAGGG 
GACGACCTTT 
CGGCGCGGCG 
AAGACCAATA 



TTTTGGACGA 
GAGGCTTTGC 
GGCGGAGATG 
TGCGTGCGGG 
TTTACCGGTT 
TTTCTTTTTG 
CAGTATGGTT 
AGCAGTCCGG 
GTTGCGGCTG 
TAGGCGCAAC 
GTGTCGGTAT 
AAGCACGCTG 
CATGGCTGCC 
ATCGAAGGCC 
GCTGCTGGTC 
CTTGGGTAGT 
TTGGAAAAGC 
CACCGATGCG 
TCATCTTGAA 
CAGGACGGCG 
GGGCGTTGCC 
AGCAGAAACC 
GACCGCGGCG 
CGGCGCGGTG 
CGGAAAAGCT 
TGGCTTGAGC 
A 



This corresponds to the amino acid sequence <SEQ ID 200; ORF33-l>: 



1 MLNPSRKLVE 
51 I DRNRMLRET 
101 VLAGVLGMNT 



LVRILDEGGF 
LERVRAGSFW 
LMLAVWLAML 



IFSGDPVQAT 
LWWAATFAF 



EALRRVDGST 
FTGFSVTYLL 



151 YADEWRQPSV 
201 LSNAASVRAV 
251 GSIACYGILP 



RWKIGATSHS 
EMLAWLPSKL 
RLLAWWCKI 



FLRVKVGRFF 
LWLCTLLGML 



SSPATWFRGK 
VSVLLLLLVR 



301 DTRRETVSAV 
351 TNREQVAALE 
401 VQLLAEQGLS 



SPKIILNDAP 
TELKQKPAQL 
DDLSEKLEHW 



GFPVPDARAV 
LLKTSENGLD 
KWAVMLETEW 
LIGVRAQTVP 
RNALAECGAA 



IEGRLNGNIA 
LEKPYYQAVI 
QDGEWFEGRL 
DRGVLRQIVR 
WLEPDRAAQE 



EEKIIRRAEM 
MDNQGLNFFL 
DPVNQAVLRL 
QYTFNWESTL 
DARAWSGLLV 
RRWQNKITDA 
AQEWLDKGVA 
LSEAAQGGAV 
GRLKDQ* 



Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF33 shows 90.9% identity over a 143aa overlap with an ORF (ORF33a) from strain A of N. 
meningitidis: 

10 20 30 

orf 33 . pep LFLRVKVGRFFSSPATWFRXKDPVNQAVLR 

I I I I I 1 II I I I I I I I I I I I I I I I I I I I I I 
orf 33a LMDNQGLNF FLVLAGVXGMNTLMLAV WLAMLFLRVKVGRFFSSPATWFRGKDPVNQAVLR 
90 100 110 120 130 140 



40 50 60 70 80 90 

orf 33 . pep LYXDEWRXTSVRWKIXATSHSLW LCTLLGMLVSVLLLLLV RQYTFNWESTLLSNAASVRA 

II I I 1 I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I : ::: I ! I 
orf 33a LYADEWRXPSVRWKIGATSHSLW LCTLLGMLVSVLLLLLV RQYTFNWESTLLGDSSSVRL 
150 160 170 180 190 200 



100 110 120 130 140 

orf 33 . pep VEMLAWLPSKLGFPVPDARSVIEGRLNGNIADARAWSG LLVXSIACXGILPRL 
I I I I I I II : I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I II 1 1 I I I I I I I 1 
orf 33a VEMLAWLPAKLGFPVPDARAVIEGRLNGNIADARAWSG LLVGSIACYGILPRLLAW AVCK 
210 220 230 240 ' 250 260 



orf33a 



ILXXTSENGLDLEKXXXXXXIRRWQNKITDADTRRETVSAVSPKIVLNDAPKWAVMLETE 
270 280 290 300 310 320 
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The complete length ORF33a nucleotide sequence <SEQ ID 201> is: 



1 


ATGTTGAATC 


51 


AGGCGGCTTT 


101 


GCCGCGTGGA 


151 


ATCGACAGGA 


201 


GTCGTTCTGG 


251 


TTTCAGTTAC 

XXX V4lV X X * * W 


301 


GTTTTGGCGG 


351 


GGCAATGTTG 

VJwiui X \J X X \J 


401 


CGACGTGGTT 


451 


TATGCGGACG 


501 


GTCGCACAGC 


551 


TGTTGCTGCT 


601 


TTGGGCGATT 


651 


TGCGAAACTG 


701 


GTCTGAACGG 


751 


GGCAGTATCG 


801 


ATGCAAAATC 




NCNNNNNTCN 

IN Ssli l\ IN ll LN I \»» IN 


901 


GATACGCGTC 


951 


CGATGCGCCG 


1001 


AATGGTTCGA 


1051 


GCCAATCGGG 


1101 


GGCGCAACTG 


1151 


TGTTGCGGCA 


1201 


GTGCANCTTT 


1251 


GGAACATTGG 


1301 


CCGACAGAGC 



CATCCCGAAA 
ATTTTCAGCG 
CGGCAGTACG 
ACCGTATGCT 
TTGTGGGTGG 
TTATCTTCTA 
GCGTGNTGGG 
TTCCTGCGCG 
TCGGGGCAAA 
AGTGGCGGCN 
CTGTGGCTCT 
TTTGGTGCGG 
CGTCTTCGGT 
GGTTTTCCCG 
CAATATTGCC 
CCTGCTACGG 
CTTNTGNAAA 
NNCGNTCATC 
GGGAAACCGT 
AAATGGGCGG 
GGGCAGGCTG 
AACAGGTTGC 
CTTATCGGCG 
GATCGTCCGA 
TGGCGGAACA 
CGTAACGCGC 
GGCGCAGGAA 



ACTGGTTGAG 
GCGATCCCGT 
GAGGAAAAAA 
GCGGGAGACG 
CGGCGGCGAC 
ATGGACAATC 
CATGAATACG 
TGAAAGTGGG 
GACCCTGTCA 
ACCTTCGGTA 
GCACGCTGCT 
CAATATACGT 
ACGGCTGGTG 
TGCCTGATGC 
GATGCGCGGG 
CATCCTGCCG 
CAAGCGAAAA 
CGCCGCTGGC 
GTCCGCCGTT 
TCATGCTGGA 
GCGCAGGAAT 
CGCGCTGGAG 
TGCGCGCCCA 
CTTTCGGAAG 
GGGGCTTTCA 
TGACCGAATG 
GGCCGTCTGA 



CTGGTCCGTA 
GCAGGCGACG 
TCATCCGTCG 
TTGGAACGTG 
GTTTGCGTTT 
AGGGTCTGAA 
CTGATGCTGG 
GCGTTTTTTC 
ATCAGGCGGT 
CGTTGGAAAA 
CGGAATGCTG 
TCAACTGGGA 
GAAATGTTGG 
GCGGGCGGTC 
CTTGGTCGGG 
CGCCTCTTGG 
CGGCTTGGAT 
AGAACAAAAT 
TCGCCGAAAA 
GACCGAATGG 
GGCTGGATAA 
ACAGAGCTGA 
AACTGTGCCC 
CGGCGCAGGG 
GACGACCTTT 
CGGCGCGGCG 
AAACCAACGA 



TTTTGGAAGA 
GAGGCTTTGC 
GGCGAAGATG 
TGCGTGCGGG 
NTTACCGNTT 
TTTCTTTTTG 
CAGTATGGTT 
AGCAGTCCGG 
GTTGCGGCTG 
TAGGCGCAAC 
GTGTCGGTAT 
AAGCACGCTG 
CATGGCTGCC 
ATCGAAGGTC 
GCTGCTGGTC 
CTTGGGCGGT 
TTGGAAAAGC 
CACCGATGCG 
TCGTCTTGAA 
CAGGACGGCG 
GGGCGTTGCC 
AGCAGAAACC 
GACCGCGGCG 
CGGCGCGGTG 
CGGAAAAGCT 
TGGCTGGAAC 
CCGCACTTGA 



This encodes a protein having amino acid sequence <SEQ ID 202>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 



MLNPSRKLVE 
IDRNRMLRET 
VLAGVXGMNT 



LVRILEEGGF 
LERVRAGSFW 
LMLAVWLAML 



IFSGDPVQAT 
LWVAAATFAF 



EALRRVDGST 
XTXFSVTYLL 



YADEWRXPSV 
LGDSSSVRLV 
GSIACYGILP 



RWKIGATSHS 
EMLAWLPAKL 
RLLAWAVCKI 



FLRVKVGRFF 
LWLCTLLGML 



SSPATWFRGK 
VSVLLLLLVR 



DTRRETVSAV 
ANREQVAALE 
VXLLAEQGLS 



SPKIVLNDAP 
TELKQKPAQL 
DDLSEKLEHW 



GFPVPDARAV 
LXXTSENGLD 
KWAVMLETEW 
LIGVRAQTVP 
RNALTECGAA 



IEGRLNGNIA 
LEKXXXXXXI 
QDGEWFEGRL 
DRGVLRQIVR 
WLEPDRAAQE 



EEKIIRRAKM 
MDNQGLNFFL 
DPVNQAVLRL 
QYTFNWESTL 
DARAWSGLLV 
RRWQNKITDA 
AQEWLDKGVA 
LSEAAQGGAV 
GRLKTNDRT* 



ORF33a and ORF33-1 show 94.1% identity in 444 aa overlap: 



10 20 30 40 50 60 

orf 33a . pep MLNPSRKLVELVRILEEGGFIFSGDPVQATEALRRVDGSTEEKIIRRAKMIDRNRMLRET 
I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I ! I I I I I I I t I I I : I I I I I I I I I I I 
orf 33-1 MLNPSRKLVELVRILDEGGFIFSGDPVQATEALRRVDGSTEEKIIRRAEMIDRNRMLRET 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 33a . pep LERVRAGSFWLWAAATFAFXTXFSVTYLl^DNQGLNFFLVI^ 

I I I I I I I I I I I I I : I i I I I I ! I I II I I I I I I I I I I I I I I 1 I I I I I I I I I I I I 1 I II I 
orf 33-1 LERVRAGSFWLWWAATFAFFTGFSVTYLLMDNQGLNFFLVLAGVLGMNTLMLAVWLAML . 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 33a . pep FLRVKVGRFFSSPATWFRGKDPVNQAVLRLYADEWRXPSVRWKIGATSHSLWLCTLLGML 

t I I I I I I 1 I I I I I I 1 1 J f I I J I I I 1 I I t I I I I I 1 i I I I I I I I I I I I I I I I I M I I I I I I 
orf 33-1 FLRVKVGRFFSSPATWFRGKDPVNQAVLRLYADEWRQPSVRWKIGATSHSLWLCTLLGML 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 33a . pep VSVLLLLLVRQYTFNWESTLLGDSSSVRLVEMLAWLPAKLGFPVPDARAVIEGRLNGNIA 
I I I I I I II I 1 I I I I I I I I I I I :::: I I \ I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I 
orf 33-1 VSVLLLLLVRQYTFNWESTLLSNAASVRAVEMLAWLPSKLGFPVPDARAVIEGRLNGNIA 

190 200 210 220 230 240 



orf 33a. pep 



250 260 270 280 290 300 

DARAWSGLLVGSIACYGILPRLLAWAVCKILXXTSENGLDLEKXXXXXXIRRWQNKITDA 
I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I 
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10 



15 



20 



orf33-l DARAWSGLLVGSIACYGILPRLLAWWCKILLKTSENGLDLEKPYYQAVIRRWQNKITDA 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 33a . pep DTRRETVSAVSPKIVLNDAPKWAVMLETEWQDGEWFEGRLAQEWLDKGVAANREQVAALE 
| | | I | I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : It I I I I I I I 
orf 33-1 DTRRETVSAVSPKIILNDAPKWAVMLETEWQDGEWFEGRLAQEWLDKGVATNREQVAALE 

310 320 330 340 350 360 

370 380 390 400 410 420 

or f 33a . pep TELKQKPAQLLIGVRAQTVPDRGVLRQIVRLSEAAQGGAWXLLAEQGLSDDLSEKLEHW 
I I || I I J I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I II I I I I I I I I I I I I 
orf 33-1 TELKQKPAQLLIGVRAQTVPDRGVLRQIVRLSEAAQGGAWQLLAEQGLSDDLSEKLEHW 

370 380 390 400 410 420 

430 440 450 

or f 33a . pep RNALTECGAAWLEPDRAAQEGRLKTNDRTX 

I I I I : I I I I I I I I I I I I I I I I I I I 
o r f 3 3 - 1 RN ALAECGAAWLE PDRAAQEGRLKDQX 

430 440 

Homology with a predicted ORF from N. gonorrhoeae 

ORF33 shows 91.6% identity over a 143aa overlap with a predicted ORF (ORF33.ng) from N. 



25 



30 



35 



gonorrhoeae: 

orf 33. pep 
orf 33ng 
orf 33 .pep 
orf 33ng 
orf 33. pep 
orf33ng 



LFLRVKVGRFFSSPATWFRXKDPVNQAVLR 30 
I I I I II II II II I I II I I I I I I I I I I I I 

LMDNQGLNFFLVLAGVLGMNTLMLAVWLATLFLRVKVGRFFS SPATWFRGKGPVNQAVLR 100 

LYXDEWRXTSVRWKIXATSHSLWLCTLLGMLVSVLLLLLVRQYTFNWESTLLSNAASVRA 90 

|| | : || I I I I I I I I : I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I M I 

LYADQWRQPSVRWKIGATAHSLWLCTLLGMLVSVLLLLLVRQYTFNWESTLLSNAASVRA 160 

VEMLAWLPSKLGFPVPDARSVIEGRLNGNIADARAWSGLLVXSIACXGILPRL 143 

M I I II I I I I I M I II I M : I I I I I I I I I I I I I M II I I II 11:1 MINI 

VEMLAWLPSKLGFPVPDARAVIEGRLNGNIADARAWSGLLVGSIVCYGILPRLLAWWCK 220 



An ORF33ng nucleotide sequence <SEQ ID 203> was predicted to encode a protein having amino 
acid sequence <SEQ ID 204>: 



40 



45 



l 

51 
101 
151 
201 
251 
301 
351 



MIDRDRMLRD 
LVLAGVLGMN 



TLERVRAGSF 
TLMLAVWLAT 



WLWVWASMM FTAGFSGTYL 



LYADQWRQPS 
LLSNAASVRA 



VRWKIGATAH 
VEMLAWLPSK 



LFLRVKVGRF 
SLWLCTLLGM 



FSSPATWFRG 
LVSVLLLLLV 



VGSIVCYGIL 
ADTRRETVSA 
AANREQVAAL 
WQLLAEQGL 



PRLLAWWCK 
VSPKIVLNDA 
ETELKQKPAQ 
SDDLSEKLEH 



LGFPVPDARA 
ILLKTSENGL 
PKWALMLETE 
LLIGVRAQTV 
WRNALTECGA 



VIEGRLNGNI 
DLEKTYYQAV 
WQDGQWFEGR 
PDRGVLRQIV 
AWLEPDRVAQ 



LMDNQGLNFF 
KGPVNQAVLR 
RQYTFNWEST 
ADARAWSGLL 
IRRWQNKITD 
LAQEWLDKGV 
RLSEAAQGGA 
EGRLKDQ* 



Further sequence analysis revealed the following DNA sequence <SEQ ID 205>: 



50 



55 



60 



1 ATGTTGaatC 

51 agggggtTTT 

101 gccgcgtgga 

151 atcgACAGGg 

201 gtcgtTctgG 

251 TTTCAGgcac 

301 GTTTTggcgG 

351 gGCAACGTTG 

401 CGACGTGGTT 

451 TATGCGGACC 

501 GGCGCACAGC 

551 TGCTGCTGCT 

601 TTGAGCAATG 

651 GTCGAAACTC 

701 GTCTGAACGG 

751 GGCAGTATCG 



CATCCCgaAA 
attttcagcg 
cggcAGTACG 
accgtatgtt 
TTATGGGTGG 
ttatCttCTG 
GAGTGTtggG 
TTCCTGCGCG 
TCGGGGCAAA 
AGTGGCGGCA 
TTGTGGCTCT 
TTTGGTGCGG 
CCGCTTCGGT 
GGTTTCCCTG 
CAATATTGCC 
TCTGCTACGG 



ACTGgttgag 
gcgatcctgt 
GAggAaaaaa 
gcgggACaCg 
TggtggCAtC 
ATGGACaatC 
CATGaatacG 
TGAAAGTGGG 
GGCCCTGTAA 
ACCTTCGGTA 
GCACGCTGCT 
CAATATACGT 
ACGCGCGGTG 
TCCCCGATGC 
GATGCGCGGG 
CATCCTGCCG 



ctGgTCCgtA 
gcaggcgacg 
tcttccgtcg 
TtggaacGTG 
gATGATGTtt 
AGGGGCtGAA 
ctgATGCTGG 
ACGGTTTTTC 
ATCAGGCGGT 
CGATGGAAAA 
CGGAATGCTG 
TCAACTGGGA 
GAAATGTTGG 
GCGGGCGGTC 
CTTGGTCGGG 
CGCCTCTTGG 



Ttttgaataa 
gaggctttgc 
GGCGGAGAtg 
TGCGTGCggg 
aCCGCCGGAT 
TtTCTTTTTA 
CAGTATGGtt 
AGCAGTCCGG 
GTTGCGGCTG 
TAGGCGCAAC 
GTGTCGGTAT 
AAGCACGCTG 
CATGGCTGCC 
ATCGAAGGTC 
GCTGCTGGTC 
CTTGGGTAGT 
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801 GTGTAAAATC CTTTTGAAAA CAAGCGAAAA CGGattgGAT TTGGAAAAAA 

851 CCTATTATCA GGCGGTCATC CGCCGCTGGC AGAACAAAAT CACCGATGCG 

901 GATACGCGTC GGGAAACCGT GTCCGCCGTT TCGCcgaAAA TCGTCTTGAA 

951 CGATGCGCCG AAATGGGCGC TCATGCTGGA GACCGAGTGG CAGGACGGCC 

1001 AATGGTTCGA GGGCAGGCTG GCGCAGGAAT GGCTGGATAA GGGCGTTGCC 

1051 GCCAATCGGG AACAGGTTGC CGCGCTGGAG ACAGAGCTGA AGCAGAAACC 

1101 GGCGCAACTG CTTATCGGCG TACGCGCCCA AACTGTGCCG GACCGGGGCG 

1151 TGCTGCGGCA GATTGTGCGG CTTTCGGAAG CGGCGCAGGG CGGCGCGGTG 

1201 GTGCAGCTTT TGGCGGAACA GGGGCTTTCA GACGACCTTT CGGAAAAGCT 

1251 GGAACATTGG CGTAACGCGC TGACCGAATG CGGCGCGGCG TGGCTTGAGC 

1301 CTGACAGGGT GGCGCAGGAA GGCCGTTTGA AAGACCAATA A 

This encodes a protein having amino acid sequence <SEQ ID 206; ORF33ng-l>: 

1 MLNPSRKLVE LVRILNKGGF IFSGDPVQAT EALRRVDGST EEKIFRRAEM 

51 IDRDRMLRDT LERVRAGS FW LWVWASMMF TAGFS GTYLL MDNQGLNFFL 

101 VLAGVLGMNT LMLAV WLATL FLRVKVGRFF SSPATWFRGK GPVNQAVLRL 

151 YADQWRQPSV RWKIGATAHS LW LCTLLGML VSVLLLLLV R QYTFNWESTL 

201 LSNAASVRAV EMLAWLPSKL GFPVPDARAV IEGRLNGNIA DARAWSG LLV 

251 GSIVCYGILP RLLAW WCKI LLKTSENGLD LEKTYYQAVI RRWQNKITDA 

301 DTRRETVSAV SPKIVLNDAP KWALMLETEW QDGQWFEGRL AQEWLDKGVA 

351 ANREQVAALE TELKQKPAQL LIGVRAQTVP DRGVLRQIVR LSEAAQGGAV 

401 VQLLAEQGLS DDLSEKLEHW RNALTECGAA WLEPDRVAQE GRLKDQ* 

ORF33ng-l and ORF33-1 show 94.6% identity in 446 aa overlap: 

10 20 30 40 50 60 

orf 33-1 . pep MLNPSRKLVELVRILDEGGFIFSGDPVQATEALRRVDGSTEEKIIRRAEMIDRNRMLRET 
I I I I I I I I I I I I I I I :: I I I I I I I I I I I I I I I I I I I I I i I I I I I: I I I t I i I I: I I I I: I 
orf33ng-l MLNPSRKLVELVRILNKGGFIFSGDPVQATEALRRVDGSTEEKIFRRAEMIDRDRMLRDT 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 33-1 . pep LERVRAGSFWLWWAATFAFFTGFSVTYLI^DNQGLNFFLVLAGVLG^TI^IAVWLAML 
I I I I I I I I I ] I I I I : I : : 1 : I I I I I I I I I I I i 1 ! I I I I I I I I I I I I 1 I i I I I I I I ! 
orf33ng-l LERVRAGSFWLWVWASMMFTAGFSGTYLLMDNQGLNFFLVLAGVLGMNTLMLAVWLATL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 33-1 . pep FLRVKVGRFFSSPATWFRGKDPVNQAVLRLYADEWRQPSVRWKIGATSHSLWLCTLLGML 
I I I I 1 I I I I II I I I I I I I I I I I I I I I I i I i I !: I I I I I I I II I I I I : I I I I I I I I I I I I 
orf33ng-l FLRVKVGRFFSSPATWFRGKGPVNQAVLRLYADQWRQPSVRWKIGATAHSLWLCTLLGML 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 33-1 . pep VSVLLLLLVRQYTFNWESTLLSNAASVRAVEMLAWLPSKLGFPVPDARAVIEGRLNGNIA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I M I I I I I I I I I I I I I I I I I 
orf33ng-l VSVLLLLLVRQYTFNWESTLLSNAASVRAVEMLAWLPSKLGFPVPDARAVIEGRLNGNIA 

190 200 210 220 230 240 

250 • 260 270 280 290 300 

or f 33-1 . pep DARAWSGLLVGSIACYGILPRLLAWWCKILLKTSENGLDLEKPYYQAVIRRWQNKITDA 
M | It I I M I I I I : I I I I I M I I I I I M I M I I II I I II I I I I I I I I I I I M I I I I I I I 
orf33ng-l DARAWSGLLVGS I VCYGILPRLLAWWCKILLKTSENGLDLEKTYYQAV I RRWQNKITDA 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 33-1 . pep DTRRET VS AVS PKI I LN DAPKWAVMLETEWQDGEW FEGRLAQEWLDKGVATNREQVAALE 
I I I I I II I I I I I II : I I I I I I i I : I I I I I I I I I : I I I I I I I I I I I I I I M : I I I M I I I I 
orf33ng-l DTRRETVSAVSPKIVLNDAPKWALMLETEWQDGQWFEGRLAQEWLDKGVAANREQVAALE 

310 320 330 340 350 360 

370 380 390 400 410 420 

or f 33-1 . pep TELKQKPAQLLIGVRAQTVPDRGVLRQ I VRLSEAAQGGAWQLLAEQGLS DDLSEKLEHW 
|| | I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I 
orf33ng-l TELKQKPAQLLIGVRAQTVPDRGVLRQ I VRLSEAAQGGAWQLLAEQGLS DDLSEKLEHW 

370 380 390 400 410 420 

430 440 
O r f 3 3 - 1 . pep RNALAECGAAWLE PDRAAQEGRLKDQX 
I M I : I II I I I I I I I I : I I I I I I I I I I 
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orf33ng-l RNALTECGAAWLEPDRVAQEGRLKDQX 

430 440 

Based on the presence of several putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from N.meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 25 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 207>: 

1 . . CAGAAGAGTT TGTCGAGAAT TTCTTTATGG GGTTTGGGCG GCGTGTTTTT 

51 CGGGGTGTCC GGTCTGGTAT GGTTTTCTTT GGGCGTTTCT TT.GAGTGCG 

101 CCTGTTTTTC GGGTGTTTCT TTTCGGGGTT CGGGACGGGG GACGTTTGTG 

151 GGCAGTACGG GGGTTTCTTT GAGTGTGTTT TCAGCTTGTG TTCC.GGCGT 

201 CGTCCGGCTG CCTGTCGGTT TGAGCTGTGT CGGCAGGTTG CG..GTTTGA 

251 CCCGGTTTTT CTTGGGTGCG GCAGGGGACG TCATTCTCCT GCCGCTTTCG 

301 TCTGTGCCGT CCGGCTGTGC GGGTTCGGAT GAGGCGGCGT GGTGGTGTTC 

351 GGGTTGGGCG GCATCTTGTJ CCGACTACGC CGTTTGGCAG CCAGAATTCG 

401 GTTTCGCGGG GGCTGTCGGT GTGTTGCGGT TCGGCTTGAA GGGTTTTGTC 

451 GTCC. 

This corresponds to the amino acid sequence <SEQ ID 208; ORF34>: 

1 ..QKSLSRISLW GLGGVFFGVS GLVWFSLGVS XECACFSGVS FRGSGRGTFV 
51 GSTGVSLSVF SACVXGWRL PVGLSCVGRL XXLTRFFLGA AGDVILLPLS 
101 SVPSGCAGSD EAAWWCSGWA ASCPTTPFGS QNSVSRGLSV CCGSA*RVLS 
151 S.. 

Further work revealed the complete nucleotide sequence <SEQ ED 209>: 



1 ATGATGATGC CGTTCATAAT GCTTCCTTGG ATTGCkGGTG TGCCTGCCGT 

51 GCCGGGTCAG AATAGGTTGT CCAGAATTTC TTTATGGGGT TTGGGCGGCG 

101 TGTTTTTCGG GGTGTCCGGT TTGGTATGGT TTTCTTTGGG CGTTTCTTTG 

151 GGCTGCGCCT GTTTTTCGGG TGTTTCTTTT CGGGGTTCGG GACGGGGGAC 

201 GTTTGTGGGC AGTACGGGGG TTTCTTTGAG TGTGTTTTCA GCTTGTGTTC 

251 CGGCGTCGTC CGGCTGCCTG TCGGTTTGAG CTGTGTCGGC AGGTTGCGGT 

301 TTGACCCGGT TTTTCTTGGG TGCGGCAGGG GACGGCAGTC CGCTGCCGCT 

351 TTCGTCTGTG CCGTCCGGCT GTGCGGGTTC GGATGAGGCG GCGTGGTGGT 

401 GTTCGGGTTG GGCGGCATCT TGTCCGACTA CGCCGTTTGG CAGCCAGAAT 

451 TCGGTTTCGC GGGGGCTGTC GGTGTGTTGC GGTTCGGCTT GAAGGGTTTT 

501 GTCGCCGTTC GGGTTGAATG TGCTGACGAT GCCTATTGCC AATGCGCCGA 

551 TGGCGGCGAT ACAGATGAGC AATACGGCGC GTATCAGGAG TTTGGGGGTC 

601 AGCCTGAAGG GTTTGTTCGG TTTTTTTGCC ATTTTGATTG TGCTTTTGGG 

651 GTGTCGGGCA ATGCCGTCTG AAGGCGGTTC AGACGGCATT GCCGAGTCAG 

701 CGTTGGACGT AGTTTTGGTA GAGGGTGATG ACTTTTTGTA CGCCGACGGT 

751 GGTGCTGACT TTTTGGGTAA TCTGCGCCTG TTCTTCGGGG GTGAGGATGC 

801 CCATAACGTA GGTTACGTTG CCGTAGGTAA CGATTTTGAC GCGCGCCTGT 

851 GTGGCGGGGC TGATGCCCAA CAGCGTGGCG CGGACTTTGG ATGTGTTCCA 

901 AGTGTCGCCG GCGATGTCGC CGGCAGTGCG CGGCAGGGAG GCGACGGTAA 

951 TATAGTTGTA CACGCCTTCG GCGGCCTGTT CGGAACGTGC AATCTGACCG 

1001 ACGAACTGTT TTTCGCCTTC GGTGGCGACT TGTCCGAGCA GCAGCAGGTG 

1051 GCGGTTGTAG CCGACGACGG AGATTTGGGG CGTGTAGCCT TTGGTTTGGT 

1101 TGTTTTGGCG CAGATAGGAA CGGGCGGTGG TTTCGATACG CAACGCCATA 

1151 ACGTTGTCGT CGGTTTGCGC GCCGGTGGTT CGGCGGTCGA CGGCGGATTT 

1201 CGCGCCGACG GCGGCGCTTC CGATTACTGC GCTGACGCAG CCGCTAAGGG 

1251 CAAGGCTGAA AATGGCGGCA ATCAGGGTGC GGACGGTGTG CGGTTTGGGT 

1301 TTCATCGGGT GCTTCCTTTC TTGGGCGTTT CAGACGGCAT TGCTTTGCGC 

1351 CATGCCGTCT GA 

This corresponds to the amino acid sequence <SEQ ID 210; ORF34-l>: 



1 MMMPFIMLPW IAGVPA VPGQ NRLSR ISLWG LGGVFFGVSG LVW FSLGVSL 

51 GCACFSGV SF RGSGRGTFVG STGVSLSVFS ACVPASSGCL SV*AVSAGCG 

101 LTRFFLGAAG DGSPLPLSSV PSGCAGSDEA AWWCSGWAAS CPTTPFGSQN 

151 SVSRGLSVCC GSA*RVLSPF GLNVLTMPIA NAPMAAIQMS NTARIRSLGV 
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201 SLKGLFGFFA ILIVLLGCRA 



251 GADFLGNLRL 

301 SVAGDVAGSA 

351 AWADDGDLG 

4 01 RADGGASDYC 

451 HAV* 



FFGGEDAHNV 
RQGGDGNIW 
RVAFGLWLA 



MPSEGGSDGI 
GYVAVGNDFD 
HAFGGLFGTC 
QIGTGGGFDT 



ADAAAKGKAE NGGNQGADGV 



AESALDWLV 
ARLCGGADAQ 
NLTDELFFAF 
QRHNVWGLR 
RFGFHRVLPF 



EGDDFLYADG 
QRGADFGCVP 
GGDLSEQQQV 
AGGSAVDGGF 
LGVSDGIALR 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF34 shows 73.3% identity over a 161aa overlap with an ORF (ORF34a) from strain A of N. 



meningitidis: 



orf 34 .pep 
orf34a 

orf 34 .pep 
orf 34a 

orf 34 .pep 
orf34a 



10 20 30 
QKSLSR ISLWGLGGVFFGVSGLV WFSLG VSXE CAC 

II Ml I I I I I I I I I I I I I I I I I I I I I I I Ml 
MMXPXIMLPWIAGVPAV PGQKRLSR XSLWGLGGXFFGVSGLVW FSLG VSXSLGVSXGCAC 
To 20 30 40 50 60 

40 50 60 70 80 90 
FSGySFRGSGRG TFVGSTGVSLSVFSACV XGWRLPVGLSCVGRLXX LTRFFLGA 

TTTTl I I I I II I I I I I I I I I I I I I I I I I : Ill I II 

FSGV S FRG SGRG TFVGSTGVSLSVFSACA PAS SGCL S VXAVS AGCGLTRX FXGA 

70 80 90 100 110 

100 110 120 130 140 150 

AGDVILLPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLS 

Ml 111111111111:11 I M M I M I II I I I M II I M I I I I I I I I I : I I I I 
AGDGSPLPLSSVPSGCAGADEEAXXCSGWAASCPTTPFGSQNSVSRGLSVCCGSVWRVLS 
120 130 140 150 160 170 



orf 34. pep S 

orf 34 a PFaXNVLTMPIANAPMAVIQMSNTARIRSL GVSLKGLFXFFAILIVLL GCRAMPSEGGSD 
180 190 200 210 220 230 

The complete length ORF34a nucleotide sequence <SEQ ID 21 1> is: 

1 ATGATGATNC CGTTNATAAT GCTTCCTTGG ATTGCGGGTG TGCCTGCCGT 

51 GCCGGGTCAG AAGAGGTTGT CGAGAANTTC TTTATGGGGT TTAGGCGGCN 

101 TGTTTTTCGG GGTGTCCGGT TTGGTATGGT TTTCTTTGGG CGTTTCTNTT 

151 TCTTTGGGTG TTTCTNTGGG CTGTGCCTGT TTTTCGGGTG TTTCTTTTCG 

201 GGGTTCGGGA CGGGGGACGT TTGTGGGCAG TACNGGGGTT TCTTTGAGTG 

251 TGTTTTCAGC TTGTGCTCCG GCGTCGTCCG GCTGCCTGTC GGTTTNAGCT 

301 GTGTCGGCAG GTTGCGGTTT GACCCGGNTT TTCTTNGGTG CGGCAGGGGA 

351 CGGCAGTCCG CTGCCGCTTT CGTCTGTGCC GTCCGGCTGT GCGGGTGCGG 

401 ATGAGGAGGC GTNGTNGTGT TCGGGTTGGG CGGCATCTTG TCCGACTACG 

451 CCGTTTGGCA GCCAGAATTC GGTTTCGCGG GGGCTGTCGG TGTGTTGCGG 

501 TTCGGTNTGG AGGGTTTTGT CNCCGTTCGG GTNGAATGTG CTGACGATGC 

551 CTATTGCCAA TGCGCCGATG GCGGTGATAC AGATGAGCAA TACGGCGCGT 

601 ATCAGGAGTT TGGGGGTCAG CCTGAAGGGT TTGTTCNGTT TTTTTGCCAT 

651 TTTGATTGTG CTTTTGGGGT GTCGGGCAAT GCCGTCTGAA GGCGGTTCAG 

701 ACGGCATTGC CGAGTCAGCG TTGGACGTAG TTTNGGTAGA GGGTGATGAC 

751 TTTTTGTACG CCGACGGTGG TGCTGACTTT TTGGGTAATC TGCGCCTGTT 

801 CTTCGGGGGT GAGGATGCCC ATAACGTAGG TTACGTTGCC GTAGGTAACG 

851 ATTTTGACGC GCGCCTGTGT GGCGGGGCTG ATGCCCAACA GCGTGGCGCG 

901 GACTTTGGAT GTGTTCCAAG TGTCGCCGGC GATGTCGCCG GCAGTGCGCG 

951 GCAGGGAGGC GACGGTAATG TANTTGTACA CGCCTTCGGC GGCCTGTTCG 

1001 GAACGTGCAA TCTGACCGAC GAACTGTTTC TCGCCTTCGG TGGCGACTTG 

1051 TCCGAGCAGC AGCAGGTGGC GGTTGTAGCC GACAACGGAG ATTTGGGGCG 

1101 TGTANCCTTT GGTTTGGTTG TTTTGGCGCA GATAGGAGCG GGCGGTGGTT 

1151 TCGATACGCA GCGCCATTAC GTTGTCGTCG GTTNGCGCGC CGGTGGTTCG 

1201 GCGGTCGACG GCGGATTTCG CGCCGACCGC CGCGCCGCCG ACGACTGCGC 

1251 TGACGCAGCC GCCGAGGGCA AGGCTGAGGA CGGCGGCAGT CAGGGTGCGG 

1301 ACGGTGTGCG GTTTGGGTTT CATCGGGTGC TTCCTTTCTT GGGCGTTTCA 

1351 GACGGCATTG CTTTGCGCCA TGCCGTCTGA 
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This encodes a protein having amino acid sequence <SEQ ID 212>: 



10 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MMXPXIMLPW IAGVPA VPGQ 
SLGVSXGCAC FSGV SFRGSG 
VSAGCGLTRX FXGAAGDGSP 
PFGSQNSVSR GLSVCCGSVW 
IRSL GVSLKG LFXFFAILIV 
FLYADGGADF LGNLRLFFGG 
DFGCVPSVAG DVAGSARQGG 
SEQQQVAWA DNGDLGR VXF 
AVDGGFRADR RAADDCADAA 
DGIALRHAV* 



KRLSR XSLWG LGGXFFGVSG 
RG TFVGSTGV SLSVFSACA P 
LPLSSVPSGC AGADEEAXXC 
RVLSPFGXNV LTMPIANAPM 
LLGCRAMPSE GGSDGIAESA 
EDAHNVGYVA VGNDFDARLC 
DGNVXVHAFG GLFGTCNLTD 
GLWLAQIGA GGGFDTQRHY 



AEGKAEDGGS QGADGVRFGF 



LVWFSLGVSX 
ASSGCLSVXA 
SGWAASCPTT 
AVIQMSNTAR 
LDWXVEGDD 
GGADAQQRGA 
ELFLAFGGDL 
WVGXRAGGS 
HRVLPFLGVS 



ORF34a and ORF34-1 show 91.3% identity in 459 aa overlap: 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



10 20 30 40 50 60 
orf 34a . pep MMXPXIMLPW I AGVPAVPGQKRLSRXSLWGLGGXFFGVSGLVWFSLGVSXSLGVSXGCAC 
It i I I I II I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I II I Ml! 
orf 34-1 MMMPFIMLPWIAGVPAVPGQNRLSRISLWGLGGVFFGVSGLVWFSLGVSL GCAC 

10 20 30 40 50 

70 80 90 100 110 120 

orf 34a. pep FSGVSFRGSGRGTFVGSTGVSLSVFSACAPASSGCLSVXAVSAGCGLTRX FXGAAGDGSP 

I J I II I I I I I I I I I I I I I I I I I II I I I I : I I I I I I I I I I I I I I I I II I I I I I I I I I I I 
orf 34-1 FSGVSFRGSGRGTFVGSTGVSLSVFSACVPASSGCLSVXAVSAGCGLTRFFLGAAGDGSP 
60 70 80 90 100 110 

130 140 150 160 170 180 

orf 34a. pep LPLSSVPSGCAGADEEAXXCSGWAASCPTTPFGSQNSVSRGLSVCCGSVWRVLSPFGXNV 

I I I I I II I I I I I : I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I : I I I I I I I II 
orf 34-1 LPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLSPFGLNV 
120 130 140 150 160 170 

190 200 210 220 230 240 

orf 34a. pep LTMPIANAPMAVIQMSNTARIRSLGVSLKGLFXFFAILIVLLGCRAMPSEGGSDGIAESA 

I I I I I I I I I I I : I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I M I M 
orf 34-1 LTMPIANAPMAAIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSDGIAESA 

180 190 200 210 220 230 

250 260 270 280 290 300 

orf 34a . pep LDVVXVEGDDFLYADGGADFLGNLRLFFGGEDAHNVGYVAVGNDFDARLCGGADAQQRGA 
MM I I II II I I I I I M I I II II II I M I II I I I I I I I I II I II I I II II II 1 II II I I 
orf 34-1 LDWLVEGDDFLYADGGADFLGNLRLFFGGEDAHNVGYVAVGNDFDARLCGGADAQQRGA 
240 250 260 270 280 290 

310 320 330 340 350 360 

orf 34a . pep DFGCVPSVAGDVAGSARQGGDGNVXVHAFGGLFGTCNLTDELFLAFGGDLSEQQQVAWA 

II I I II I I I I I I II II II II I II : I I II I I II I II I I I II I I : I II I M II II I II II I 
orf 34-1 DFGCVPSVAGDVAGSARQGGDGNIWHAFGGLFGTCNLTDELFFAFGGDLSEQQQVAWA 

300 310 320 330 340 350 

370 380 390 400 410 420 

orf 34a . pep DNGDLGRVXFGLWLAQIGAGGGFDTQRHYVWGXRAGGSAVDGGFRADRRAADDCADAA 
I : I I II I I II I II II II I : II I M I I II I I I I I I I I I I I I I I II I I Ml I II I I 
orf 34-1 DDGDLGRVAFGLWLAQIGTGGGFDTQRHNWVGLRAGGSAVDGGFRADGGASDYCADAA 
360 370 380 390 400 410 

430 440 450 460 

orf 34a . pep AEGKAEDGGSQGADGVRFGFHRVLPFLGVSDGIALRHAVX 

I : I I I I : IE : I M I I I I I I I I I I I I I I I I I I I M I I i 1 I I 
orf 3 4-1 AKGKAENGGNQGADGVRFGFHRVLPFLGVSDGIALRHAVX 
420 430 440 450 

Homology with a predicted ORF from N. gonorrhoeae 

ORF34 shows 77.6% identity over a 161aa overlap with a predicted ORF (ORF34.ng) from N. 
gonorrhoeae: 



orf34 .pep 



QKSLSRISLWGLGGVFFGVSGLVWFSLGVSXE- 



-CAC 35 
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orf 34ng 
orf34 .pep 
orf 34ng 
orf 34 .pep 
orf34ng 
orf 34 .pep 
orf34ng 



II I I I I I I I I I : I I I I II I I I ! I I I I II I III 
MMMPFIMLPWIAGVPAVPGQKRLSRISLWGLAGVFFGVSGLVWFSLGVSFSLGVSLGCAC 

FSGVSFRGSGRGTFVGSTGVSLSVFSACVXGWRLPVGLSCV GRLXXLTRFFLGA 

I | | I I I I I I I I : I I I I I I I I I I I I I I I I : I I : I : II I I I I I I I I 

FSGVS FRGSGWGAFVGSTGVS LSV FSACVP VPVNE SAARAASEGR- -GLTRFFLGA 



60 



90 



114 



150 



AGDVILLPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLS 

Ml I I I I I I M I I I I M I I I I I I I I I I I I I I I s I I I I I I I I I I I I I I I I I I : I I II 
AGDGSPLPLSSVPSGCAGSDEAAWWCSGWAASCPTAPFGSQNSVSRGLSVCCGSVWRVLS 174 

S 175 

PFGLNVLTMPTANAPMAVIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSD 234 



The complete length ORF34ng nucleotide sequence <SEQ ID 213> is: 



1 ATGATGATGC CGTTCATAAT 

51 GCCGGGTCAA AAGAGGTTGT 

101 TGTTTTTCGG GGTGTCCGGT 

151 TCTTTGGGTG TTTCTTTGGG 

201 GGGTTCGGGA TGGGGGGCGT 

251 TGTTTTCAGC TTGTGTTCCG 

301 GCATCCGAAG GGCGCGGTTT 

351 CGGCAGTCCG CTGCCGCTTT 

401 ATGAGGCGGC GTGGTGGTGT 

451 CCGTTTGGCA GCCAGAATTC 

501 TTCGGTTTGG AGGGTTTTGT 

551 CTACTGCCAA TGCGCCGATG 

601 ATCAGGAGTT TGGGGGTCAG 

651 TTTGATTGTG CTTTTGGGGT 

701 ACGGCATTGC CGAGTCAGCG 

751 TTTTTGTACG CCGAcggTGG 

801 CTTCGGGGGT GAGGATGCCC 

851 ATTTTGACGC GCGCCTGTGT 

901 GACTTTGGAC GTGTTCCAAG 

951 GCAGGGAGGC GACGGTAATG 

1001 GAACGTGCAA TCTGACCGAC 

1051 TCCGAGCAGC AGCAGGTGGC 

1101 TGTAGCCTTT GGTTTGGTTG 

1151 TCGATACGCA ACGCCATAAC 

1201 gCGGTCGATG ACGGATTTTG 

1251 TGAAGCAGCC GCCGAGGGCA 

1301 ACGGTGTGTG GTTTGGGTTT 

1351 GACGGCATTG CTTTGCGCCA 

This encodes a protein having amino acic 



GCTTCCTTGG ATTGCGGGTG TGCCTGCCGT 
CGAGAATCTC TTTATGGGGT TTGGCCGGCG 
TTGGTATGGT TTTCTTTGGG CGTTTCTTTT 
CTGCGCCTGT TTTTCGGGTG TTTCTTTTCG 
TTGTGGGCAG TACGGGGGTT TCTTTGAGTG 
GTGCCGGTTA ACGAATCGGC TGCCCGGGCC 
gACCCGGTTT TTCTTGGGTG CGGCAGGGGA 
CTTCTGTGCC GTCCGGCTGT GCGGGTTCGG 
TCGGGTTGGG CGGCATCTTG TCCGACGGCG 
GGTTTCGCGG GGGCTGTCGG TGTGTTGCGG 
CGCCGTTCGG GTTGAATGTG CTGACGATGC 
GCGGTGATAC AGATGAGCAA TACGGCGCGT 
CCTGAAGGGT TTGTTCGGTT TTTTTGCCAT 
GTCGGGCAAT GCCGTCTGAA GGCGGTTCAG 
TTGGACGTAG TTTTGGTAGA GGGTAATGAC 
TGCTGACTTT TTGGGTAATC TGCGCCTGTT 
ATAACGTAGG TTACATTGCC GTAGGTAATG 
AGCGGGGCTG ATGCCCAGCA GcgtgGCGCG 
TGTCGCCGGC GATGTCGCCC GCAGTGCGCG 
TAGTTGTATA CGCCTTCGGC GGCCTGTTCG 
GAACTGTTTT TCGCCTTCGG TGGCGACTTG 
GGTTGTAGCC GACGACGGAG ATTTGGGGCG 
TTTTGGCGCA GGTAGGAACG GGCGGTGGTT 
GTtgtCATCG GTTtgcgcgc CGGTGGTTcg 
CGCCGACGGC GGCCCCGCCG ACGACTGCGC 
AGGCTGAGGA CGGCGGCAAT CAGGGTGCGG 
CATCGGGGAC TTCCTTTCTT GGGCGTTTCA 
TGCCGTCTGA 

sequence <SEQ ID 214>; 



1 MMMPFIMLPW IAGVPA VPGQ KRLSR ISLWG LAGVFFGVSG LVW FSLGVSF 

51 SLGVSLGCAC FSGV SFRGSG WG AFVGSTGV SLSVFSACV P VPVNESAARA 

101 ASEGRGLTRF FLGAAGDGSP LPLSSVPSGC AGSDEAAWWC SGWAASCPTA 

151 PFGSQNSVSR GLSVCCGSVW RVLSPFGLNV LTMPTANAPM AVIQMSNTAR 

201 IRSLG VSLKG LFGFFAILIV LL GCRAMPSE GGSDGIAESA LDWLVEGND 

251 FLYADGGADF LGNLRLFFGG EDAHNVGYIA VGNDFDARLC SGADAQQRGA 

301 DFGRVPSVAG DVARSARQGG DGNVWYAFG GLFGTCNLTD ELFFAFGGDL 

351 SEQQQVAWA DDGDLG RVAF GLWLAQVGT GGGF DTQRHN WIGLRAGGS 

401 AVDDGFCADG GPADDCAEAA AEGKAEDGGN QGADGVWFGF HRGLPFLGVS 

451 DGIALRHAV* 

ORF34ng and ORF34-1 show 90.0% identity in 459 aa overlap: 



10 20 30 40 4 50 

orf 34-1. pep MMMPFIMLPWIAGVPAVPGQNRLSRISLWGLGGVFFGVSGLVWFSLGVS LGCAC 

I I I 1 I J 1 I I I I 1 I 1 t 1 I I I 1 r I S 1 I I I 1 I I I r 1 I I I I I 1 I 1 I 1 I I I I I I II III 

orf34ng MMMPFIMLPWIAGVPAVPGQKRLSRISLWGLAGVFFGVSGLVWFSLGVSFSLGVS LGCAC 

10 20 30 40 50 60 

60 70 80 90 100 110 

orf 34-1. pep FSGVS FRGSGRGTFVGSTGVSLSVFSACVPASSGCLSVXAVSAGCGLTRFFLGAAGDGSP 
I I I I I I I I I I I : I I I I I I I I I I I I J I I I I : : hi I I I I I I I I I I I I I I 1 I 

orf34ng FSGVSFRGSGWGAFVGSTGVSLSVFSACVPVPVNESAARAASEGRGLTRFFLGAAGDGSP 
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70 



80 



90 



100 



110 



120 



10 



15 



20 



25 



30 



35 



40 



120 130 140 150 160 170 

orf 34-1 . pep LPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLSPFGLNV 
I I I I I I i I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I : I I I I I I I I I I 
orf34ng LPLSSVPSGCAGSDEAAWWCSGWAASCPTAPFGSQNSVSRGLSVCCGSVWRVLSPFGLNV 

130 140 150 160 170 180 

180 190 200 210 220 230 

orf 34-1 . pep LTMPIANAPMAAIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSDGIAESA 
MM M M M : M M M M M M M M M M M M M M M M M M M M M M M M 
orf34ng LTMPTANAPMAVIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSDGIAESA 

190 200 210 220 230 240 

240 250 260 270 280 290 

orf 34-1 . pep LDWLVEGDDFLYADGGADFLGNLRLFFGGEDAHNVGYVAVGNDFDARLCGGADAQQRGA 
lllllll|:||||llll!lllllllllilllllllllhlllllllllll:lllllflll 
orf34ng LDVVLVEGNDFLYADGGADFLGNLRLFFGGEDAHNVGYIAVGNDFDARLCSGADAQQRGA 

250 260 270 280 290 300 

300 310 320 330 340 350 

orf 34-1 . pep DFGCVPSVAGDVAGSARQGGDGNIWHAFGGLFGTCNLTDELFFAFGGDLSEQQQVAWA 
Mi M M M M 1 M M M M I : M : M M M M M M M M M M M M M M M M I 
orf34ng DFGRVPSVAGDVARSARQGGDGNVWYAFGGLFGTCNLTDELFFAFGGDLSEQQQVAWA 

310 320 330 340 350 360 

360 370 380 390 400 410 

orf 34-1 . pep DDGDLGRVAFGLWLAQIGTGGGFDTQRHNVWGLRAGGSAVDGGFRADGGASDYCADAA 
M M M M M I M M M M M M M M i M M M M M M M I I i MM : I M M I 
orf34ng DDGDLGRVAFGLWLAQVGTGGGFDTQRHNWIGLRAGGSAVDDGFCADGGPADDCAEAA 

370 380 390 400 410 420 

420 430 440 450 

orf 34-1 . pep AKGKAENGGNQGADGVRFG FHRVLP FLGVS DG I ALRHAVX 
MMMMMMMM IMM MMMMMMMMI 
orf34ng AEGKAEDGGNQGADGVWFGFHRGLP FLGVS DG I ALRHAVX 

430 440 450 460 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and K gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 26 

The following partial DNA sequence was identified in K meningitidis <SEQ ID 215>: 



45 



1 ATGAAAACCT 

51 CGCCGCCTGC 

101 CCGCCGCCGA 

151 CGTCGGCGAC 

201 AGAAAAAAGG 

251 CCGAATCTGG 



TCTTCAAAAC 
GGATT . CAAA 
CAACGGCGCG 
TTCGGCGATA 
CTACACCGTC 
CATTGGCTGA 



CCTTTCCGCC 
AAGACAGCGC 
GCGTAAAAAA 
TGGTCAAAGA 
AAACTGGTCG 
GGGCGAGTTG 



GCCGCACTCG 
GCCCGCCGCA 
GAAATCGTCT 
ACAAATCCAA 
AGTTTACCGA 



CGCTCATCCT 
TCCGCTTCTG 
TCGGCACGAC 
GCCGAGCTGG 
CTATGTACGC 



50 This corresponds to the amino acid sequence <SEQ ID 216; ORF4>: 

1 MKTFFKTLSA AALALILAAC G.QKDSAPAA SASAAADNGA AKKEIVFGTT 
51 VGDFGDMVKE QIQAELEKKG YTVKLVEFTD YVRPNLALAE GEL 

Further sequence analysis revealed the complete nucleotide sequence <SEQ ID 217>: 



55 



1 ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 

51 CGCCGCCTGC GGCGGTCAAA AAGACAGCGC GCCCGCCGCA TCCGCTTCTG 

101 CCGCCGCCGA CAACGGCGCG GCGAAAAAAG AAATCGTCTT CGGCACGACC 

151 GTCGGCGACT TCGGCGATAT GGTCAAAGAA CAAATCCAAG CCGAGCTGGA 

201 GAAAAAAGGC TACACCGTCA AACTGGTCGA GTTTACCGAC TATGTACGCC 
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251 CGAATCTGGC ATTGGCTGAG GGCGAGTTGG ACATCAACGT CTTCCAACAC 

301 AAACCCTATC TTGACGACTT CAAAAAAGAA CACAATCTGG ACATCACCGA 

351 AGTCTTCCAA GTGCCGACCG CGCCTTTGGG ACTGTACCCG GGCAAGCTGA 

4 01 AATCGCTGGA AGAAGTCAAA GACGGCAGCA CCGTATCCGC GCCCAACGAC 

451 CCGTCCAACT TCGCCCGCGT CTTGGTGATG CTCGACGAAC TGGGTTGGAT 

501 CAAACTCAAA GACGGCATCA ATCCGTTGAC CGCATCCAAA GCGGACATCG 

551 CCGAGAACCT GAAAAACATC AAAATCGTCG AGCTTGAAGC CGCGCAACTG 

601 CCGCGTAGCC GCGCCGACGT GGATTTTGCC GTCGTCAACG GCAACTACGC 

651 CATAAGCAGC GGCATGAAGC TGACCGAAGC CCTGTTCCAA GAACCGAGCT 

701 TTGCCTATGT CAACTGGTCT GCCGTCAAAA CCGCCGACAA AGACAGCCAA 

751 TGGCTTAAAG ACGTAACCGA GGCCTATAAC TCCGACGCGT TCAAAGCCTA 

801 CGCGCACAAA CGCTTCGAGG GCTACAAATC CCCTGCCGCA TGGAATGAAG 

851 GCGCAGCCAA ATAA 

This corresponds to the amino acid sequence <SEQ ED 218; ORF4-l>: 



1 MKTFFKTLSA AALALILAA C GGQKDSAPAA SASAAADNGA AKKEIVFGTT 

51 VGDFGDMVKE QIQAELEKKG YTVKLVEFTD YVRPNLALAE GELDINVFQH 

101 KPYLDDFKKE HNLDITEVFQ VPTAPLGLYP GKLKSLEEVK DGSTVSAPND 

151 PSNFARVLVM LDELGWIKLK DGINPLTASK ADIAENLKNI KIVELEAAQL 

201 PRSRADVDFA WNGNYAISS GMKLTEALFQ EPSFAYVNWS AVKTADKDSQ 

251 WLKDVTEAYN SDAFKAYAHK RFEGYKSPAA WNEGAAK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF4 shows 93.5% identity over a 93aa overlap with an ORF (ORF4a) from strain A of N. 
meningitidis: 

10 20 30 40 50 59 

orf 4 . pep MKTFFKTLSAAALALILAA CG-QKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 
I I I I I I I I t I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I i I f I I I 
orf 4a MKT FFKTLS AAALALI LAA CGGQKDS APAAS ASAAADNGAAXKE I VFGTTVGDFGDMVKE 

10 20 30 40 50 60 



60 70 80 90 

orf 4 . pep QI QAELEKKG YTVKLVE FT DYVRPNLALAEGEL 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 4a X IQPELEKKG YTVKLVEXT DYVRXNLALAEGE LDIN VXQHXX YLDDXKKXHNLD ITXVXQ 

70 80 90 100 110 120 



orf 4a 



VPTAPLGLYPGKLKSLXXVKXGSTVSAPNDPXXFXRVLVMLDELGXIKLKDXIXXXXXXX 
130 140 150 160 170 180 



The complete length ORF4a nucleotide sequence <SEQ ID 219> is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGAAAACCT 
CGCCGCCTGC 
CCGCCGCCGA 
GTCGGCGACT 
GAAAAAAGGC 
CGAATCTGGC 
ANACNCTATC 
AGTCTTNCAA 
AATCGCTGGA 
CCGTNNNACT 
CAAACTCAAA 
NNNNANNNNT 
NCGNNTNNNN 
NANNANNAGC 
TTGCCTATGT 
TGGCTTAAAG 
CGCGCACAAA 
GCGCAGCCAA 



TCTTCAAAAC 
GGCGGTCAAA 
CAACGGCGCG 
TCGGCGATAT 
TACACCGTCA 
ATTGGCTGAG 
TTGACGACTN 
GTGCCGACCG 
NNAAGTCAAA 
TCGNCCGCGT 
GACNGCATCA 
NNNNNNNNNN 
NNGCNNNNNT 
GGCATGAAGC 
CAACTGGTCT 
ACGTAACCGA 
CGCTTCGAGG 
ATAA 



CCTTTCCGCC 
AAGATAGCGC 
GCGAANAAAG 
GGTCAAAGAA 
AACTGGTCGA 
GGCGAGTTGG 
CAAAAAANAA 
CGCCTTTGGG 
GANGGCAGCA 
CTTGGTGATG 
NNNNGNNGNN 
NNNNNCNNCG 
NNANNNTNNN 
TGACCGAAGC 
GCCGTCAAAA 
GGCCTATAAC 
GCTACAAATC 



GCCGCACTCG 
GCCCGCCGCA 
AAATCGTCTT 
CANATCCAAC 
GTNTACCGAC 
ACATCAACGT 
CACAATCTGG 
ACTGTACCCG 
CCGTATCCGC 
CTCGACGAAC 
NNNANCNANA 
NNNNNNNANN 
NNCNNCNNNN 
CCTGTTCCAA 
CCGCCGACAA 
TCCGACGCGT 
CCCTGCCGCA 



CGCTCATCCT 
TCCGCTTCTG 
CGGCACGACC 
CCGAGCTGGA 
TATGTGCGCN 
CTTNCAACAC 
ACATCACCNN 
GGCAAGCTGA 
GCCCAACGAC 
TGGGTTNGAT 
NNNGANANNN 
NNNNNNNNNN 
NNNNNTNNNN 
GAACCGAGCT 
AGACAGCCAA 
TCAAAGCCTA 
TGGAATGAAG 



This is predicted to encode a protein having amino acid sequence <SEQ ID 220>: 



1 MKTFFKTLSA AALALILAAC GGQKDSAPAA SASAAADNGA AXKEIVFGTT 
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51 VGDFGDMVKE XIQPELEKKG YTVKLVEXTD YVRXNLALAE GELDINVXQH 

101 XXYLDDXKKX HNLDITXVXQ VPTAPLGLYP GKLKSLXXVK XGSTVSAPND 

151 PXXFXRVLVM LDELGXIKLK DXIXXXXXXX XXXXXXXXXX XXXXXXXXXX 

201 XXXXAXXXXX XXXXXXXXXS GMKLTEALFQ EPSFAYVNWS AVKTADKDSQ 

251 WLKDVTEAYN SDAFKAYAHK RFEGYKSPAA WNEGAAK* 

A leader peptide is underlined. 



Further analysis of these strain A sequences revealed the complete DNA sequence <SEQ ID 22 1>: 

1 ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 

51 CGCCGCCTGC GGCGGTCAAA AAGATAGCGC GCCCGCCGCA TCCGCTTCTG 

101 CCGCCGCCGA CAACGGCGCG GCGAAAAAAG AAATCGTCTT CGGCACGACC 

151 GTCGGCGACT TCGGCGATAT GGTCAAAGAA CAAATCCAAC CCGAGCTGGA 

201 GAAAAAAGGC TACACCGTCA AACTGGTCGA GTTTACCGAC TATGTGCGCC 

251 CGAATCTGGC ATTGGCTGAG GGCGAGTTGG ACATCAACGT CTTCCAACAC 

301 AAACCCTATC TTGACGACTT CAAAAAAGAA CACAATCTGG ACATCACCGA 

351 AGTCTTCCAA GTGCCGACCG CGCCTTTGGG ACTGTACCCG GGCAAGCTGA 

401 AATCGCTGGA AGAAGTCAAA GACGGCAGCA CCGTATCCGC GCCCAACGAC 

451 CCGTCCAACT TCGCCCGCGT CTTGGTGATG CTCGACGAAC TGGGTTGGAT 

501 CAAACTCAAA GACGGCATCA ATCCGCTGAC CGCATCCAAA GCGGACATTG 

551 CCGAAAACCT GAAAAACATC AAAATCGTCG AGCTTGAAGC CGCGCAACTG 

601 CCGCGTAGCC GCGCCGACGT GGATTTTGCC GTCGTCAACG GCAACTACGC 

651 CATAAGCAGC GGCATGAAGC TGACCGAAGC CCTGTTCCAA GAACCGAGCT 

701 TTGCCTATGT CAACTGGTCT GCCGTCAAAA CCGCCGACAA AGACAGCCAA 

751 TGGCTTAAAG ACGTAACCGA GGCCTATAAC TCCGACGCGT TCAAAGCCTA 

801 CGCGCACAAA CGCTTCGAGG GCTACAAATC CCCTGCCGCA TGGAATGAAG 

851 GCGCAGCCAA ATAA 

This encodes a protein having amino acid sequence <SEQ ID 222; ORF4a-l>: 



1 MKT FFKTLSA AALALILAA C GGQKDSAPAA SASAAADNGA AKKEIVFGTT 

51 VGDFGDMVKE QIQPELEKKG YTVKLVEFTD YVRPNLALAE GELDINVFQH 

101 KPYLDDFKKE HNLDITEVFQ VPTAPLGLYP GKLKSLEEVK DGSTVSAPND 

151 PSNFARVLVM LDELGWIKLK DGINPLTASK ADIAENLKNI KIVELEAAQL 

201 PRSRADVDFA WNGNYAISS GMKLTEALFQ EPSFAYVNWS AVKTADKDSQ 

251 WLKDVTEAYN SDAFKAYAHK RFEGYKSPAA WNEGAAK* 

ORF4a-l and ORF4-1 show 99.7% identity in 287 aa overlap: 



10 20 30 40 50 60 

orf4a-l MKT FFKTLSAAALAL I LAACGGQKDSAPAAS AS AAADNGAAKKE I VFGTT VGDFGDMVKE 

II I II I! I II II I! I II IIIIMMI II II I II II I! I ! I I I I IN M I MMMIII1 I 
orf4-l MKT FFKTLSAAALAL I LAACGGQKDSAPAAS AS AAADNGAAKKE I VFGTT VGDFGDMVKE 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf4a-l QIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 

III I Ml II II Mill lllll II II II I I III I M II II II Ml I I II I I I II I I III I 
orf4-l . QIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf4a-l VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 

I I I I II I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I II I I I I I I I M I I I I 
orf4-l VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf4a-l ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 

II Ml MIIIIMM MM MMMI II M IMIMM I I II I II II II II I II IIMM 
orf4-l ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 

190 200 210 220 230 240 



250 260 270 280 

orf4a-l AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAKX 
I I I I I I i I I I I I I I I I I I I I 1 1 1 1 1 1 I 1 I I I I I I ! 1 1 I I I I I I 1 1 I I I 
or f 4 -1 AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAKX 

250 260 270 280 
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Homology with an outer membrane protein of Pasteurella haemolitica (accession q08869). 
ORF4 and this outer membrane protein show 33% aa identity in 91aa overlap: 

10 20 

lip2 . pasha MN FKKLLGVALVS ALALT ACKDEKAQAP 

5 I I I I I I I I : I I : I : I 

0RF4 VXTPNPDGRTPCPSFLFETATTSGENMKTFFKTLSAAAL — ALILAACGFKKTARPPHPL 

110 120 130 140 150 

30 40 50 60 70 80 

10 lip2 . pasha -ATTAKTENKAPLKVGVMTGPEAQMTEVAVKIAKEKYGLDVELVQFTEYTQPNAALHSKD 

: : : I : I : : I : : I : : : : II I I : I I : I I : I : : I I I I : 
ORF4 LPPPTTARRKKEIVFGTTVGDFGDMVKEQIQAELEKKGYTVKLVEFTDYVRPNLALAEGE 
160 170 180 190 200 210 

15 90 100 110 120 130 140 

lip2 . pasha LDANAFQT VPYLEQE VKDRGYKLAI I GNTLVWPI AAYSKKI KN I SE LKDGATVAI PNNAS 
I 

ORF4 L 

20 Homology with a predicted ORF from ^gonorrhoeae 

ORF4 shows 93.6% identity over a 94aa overlap with a predicted ORF (ORF4.ng) from N. 
gonorrhoeae: 

10 20 30 

o r f 4 mu . pep MKT FFKT LS AAALAL I LAACGXQKDS APAA 

25 & I I I I I I I II : I : II I I I I I II I I I i I I I I 

orf4ng RANAVXTPNPDGRTPCLSFLFETATTSGENMKTFFKTLSTASLALILAACGGQKDSAPAA 
200 210 220 230 240 250 

40 50 60 70 80 89 

30 orf 4nm. pep SASA-AADNGAAKKEIVFGTTVGDFGDMVKEQIQAELEKKGYTVKLVEFTDYVRPNLALA 

||:| : I I I 11 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I 
orf4ng SAAAPSADNGAAKKEIVFGTTVGDFGDMVKEQIQAELEKKGYTVKLVEFTDYVRPNLALA 
260 270 280 290 300 310 

35 90 

orf4nm.pep EGEL 
MM 

orf4ng EGELDINVFQHKPYLDDFKKEHNLDITEAFQVPTAPLGLYPGKLKSLEEVKDGSTVSAPN 
320 330 340 . 350 360 370 

40 The complete length ORF4ng nucleotide sequence <SEQ ID 223> was predicted to encode a 
protein having amino acid sequence <SEQ ID 224>: 

1 MKT FFKT LST ASLALILAAC GGQKDSAPAA SAAAPSADNG AAKKEIVFGT 

51 TVGDFGDMVK EQIQAELEKK GYTVKLVEFT DYVRPNLALA EGELDINVFQ 

101 HKPYLDDFKK EHNLDITEAF QVPTAPLGLY PGKLKSLEEV KDGSTVSAPN 

45 151 DPSNFARALV MLNELGWIKL KDGINPLTAS KADIAENLKN IKIVELEAAQ 

201 LPRSRADVDF AWNGNYAIS SGMKLTEALF QEPSFAYVNW SAVKTADKDS 

251 QWLKDVTEAY NSDAFKAYAH KRFEGYKYPA AWNEGAAK* 

Further analysis revealed the complete length ORF4ng DNA sequence <SEQ ID 225> to be: 

1 atgAAAACCT TCTTCAAAAC cctttccgcc gccgcaCTCG CGCTCATCCT 

50 51 CGCAGCCTGc ggCggtcaAA AAGACAGCGC GCCCgcagcc tctgcCGCCG 

101 CCCCTTCTGC CGATAACGgc gCgGCGAAAA AAGAAAtcgt ctTCGGCACG 

151 Accgtgggcg acttcggcgA TAtggTCAAA GAACAAATCC AagcCGAgct 

201 gGAGAAAAAA GgctACACcg tcAAattggt cgaatttacc gactatgtGC 

251 gCCCGAATCT GGCATTGGCG GAGGGCGAGT TGGACATCAA CGTCTTCCAA 

55 301 CACAAACCCT ATCTTGACGA TTTCAAAAAA GAACACAACC TGGACATCAC 

351 CGAAGCCTTC CAAGTGCCGA CCGCGCCTTT GGGACTGTAT CCGGGCAAAC 

401 TGAAATCGCT GGAAGAAGTC AAAGACGGCA GCACCGTATC CGCGCCCAac 

451 gACccgTCCA ACTTCGCACG CGCCTTGGTG ATGCTGAACG AACTGGGTTG 

501 GATCAAACTC AAAGACGGCA TCAATCCGCT GACCGCATCC AAAGCCGACA 

60 551 TCGCGGAAAA CCTGAAAAAC ATCAAAATCG TCGAGCTTGA AGCCGCACAA 
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601 CTGCCGCGCA GCCGCGCCGA CGTGGATTTT GCCGTCGTCA ACGGCAACTA 

651 CGCCATAAGC AGCGGCATGA AGCTGACCGA AGCCCTGTTC CAAGAGCCGA 

701 GCTTTGCCTA TGTCAACTGG TCTGCCgtcA AAACCGCCGA CAAAGACAGC 

751 CAATGGCTTA AAGACGTAAC CGAGGCCTAT AACTCCGACG CGTTCAAAGC 

801 CTACGCGCAC AAACGCTTCG AGGGCTACAA ATACCCTGCC GCATGGAATG 

851 AAGGCGCAGC CAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 226; ORF4ng-l>; 

1 MKTFFKTLSA AALALILAA C GGQKDSAPAA SAAAPSADNG AAKKEIVFGT 

51 TVGDFGDMVK EQIQAELEKK GYTVKLVEFT DYVRPNLALA EGELDINVFQ 

101 HKPYLDDFKK EHNLDITEAF QVPTAPLGLY PGKLKSLEEV KDGSTVSAPN 

151 DPSNFARALV MLNELGWIKL KDGINPLTAS KADIAENLKN IKIVELEAAQ 

201 LPRSRADVDF AWNGNYAIS SGMKLTEALF QEPSFAYVNW SAVKTADKDS 

251 QWLKDVTEAY NSDAFKAYAH KRFEGYKYPA AWNEGAAK* 

This shows 97.6% identity in 288 aa overlap with ORF4-1 : 

10 20 30 40 50 59 

orf 4-1 . pep MKTFFKTLSAAALALILAACGGQKDSAPAASASA-AADNGAAKKEIVFGTTVGDFGDMVK 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I t I : I : I I I I I I I I I I I I I I I I I I I I I I I I 
orf4ng-l MKTFFKTLSAAALALILAACGGQKDSAPAASAAAPSADNGAAKKEIVFGTTVGDFGDMVK 

10 20 30 40 50 60 

60 70 80 90 100 110 119 

orf4-l.pep EQIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVF 
I I ! I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I ! I I I I I I I I I I I I I II I I I : I 
orf4ng-l EQIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEAF 

70 80 90 100 110 120 

120 130 140 150 160 170 179 

orf4-l.pep QVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTAS 
I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I : I I I I : I II I I I I I I I I I I I II I 
orf4ng-l - QVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARALVMLNELGWIKLKDGINPLTAS 

130 140 150 160 170 180 

180 190 200 210 220 230 239 

orf 4-1 . pep KADIAENLKN IKI VELEAAQLPRSRADVDFAWNGNYAI SSGMKLTEALFQE PS FAYVNW 

I I I II It I I I I I I I I M I II I I I I I I M I i I I M M I I I I I I I I i I II M II I M I I I I I 
orf4ng-l KADIAENLKN IKI VELEAAQLPRSRADVDFAWNGNYAI SSGMKLTEALFQE PS FAYVNW 

190 200 210 220 230 240 

240 250 260 270 280 

or f 4-1 . pep SAVKTADKDS QWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAKX 

I I I I I I I I I I I I I I I II I I I I I I I I I I II II II I I I I I I I I I I I II I I 
orf4ng-l SAVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKYPAAWNEGAAKX 

250 260 270 280 



In addition, ORF4ng-l shows significant homology with an outer membrane protein from the 
database: 

ID LIP2_PASHA STANDARD; PRT; 276 AA. 

AC Q08869; 

DT 01-NOV-1995 (REL. 32 , CREATED) 

DT 01-NOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 

DT 01-NOV-1995 (REL. 32, LAST ANNOTATION UPDATE) 

DE 28.2 KD OUTER MEMBRANE PROTEIN PRECURSOR. . . . 

SCORES Initl: 279 Initn: 416 Opt: 494 

Smith-Waterman score: 494; 36.0% identity in 275 aa overlap 

10 20 30 40 50 

orf4ng-l.pep MKT FFKT LS AAAL — ALI LAACGGQKD S APAAS AAAPSADNGAAKKE I VFGTTVGDFGDM 
II I 111:11 :| I I I: :| ::l 

lip2 pasha MNFKKLLGVALVSALALTACKDEKAQAPATTA KTENKAPLK VGVMTGPEAQM 

10 20 30 40 50 

60 70 80 90 100 110 

orf 4ng-l . pep VKEQIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITE 
: : : : III I : I I : I I : I : : I I I I : I I I : I I Ml:: | : : : : : 
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lip2 pasha TEVAVKIAKEKYGLDVELVQFTEYTQPNAALHSKDLDANAFQTVPYLEQEVKDRGYKLAI 

60 ' 70 80 90 100 110 

120 130 140 150 160 170 

5 orf 4ng-l . pep AFQVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARALVMLNELGWIKLKDGINPLT 

: : : I : : I I : I : : I : I I I : I I : I I : I I I I I I : : I : 1 : I I I I I : 
lip2 pasha IGNTLVWPIAAYSKKIKNISELKDGATVAIPNNASNTARALLLLQAHGLLKLKDPKN-VF 
120 130 140 150 160 170 

10 180 190 200 210 220 230 

orf4ng-l.pep ASKADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTE — ALFQEPSFA 

I : : I I I I I I I I I I : : : : I I I I : : I I : I : : II : : I : : : : : : 
lip2 pasha ATENDIIENPKNIKIVQADTSLLTRMLDDVELAVINNTYAGQAGLSPDKDGIIVESKDSP 

180 190 200 210 220 230 

15 

240 250 260 270 280 289 

orf 4ng-l . pep YVNWSAVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKYPAAWNEGAAKX 

Ml : : : I I : I : ::::::: I I hi 

lip2 pasha YVNLVVSREDNKDDPRLQTFVKSFQTEEVFQEALKLFNGGWKGW 
20 ~ 240 250 260 270 

Based on this analysis, including the homology with the outer membrane protein of Pasteur -ella 
haemolitica, and on the presence of a putative prokaryotic membrane lipoprotein lipid attachment 
site in the gonococcal protein, it was predicted that these proteins from N. meningitidis and 
25 N.gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

ORF4-1 (30kDa) was cloned in pET and pGex vectors and expressed in E.colU as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figures 8A and 
8B show, repsectively, the results of affinity purification of the His-fusion and GST-fusion 
30 proteins. Purified His-fusion protein was used to immunise mice, whose sera were used for ELISA 
(positive result), Western blot (Figure 8C), FACS analysis (Figure 8D), and a bactericidal assay 
(Figure 8E). These experiments confirm that ORF4-1 is a surface-exposed protein, and that it is a 
useful immunogen. 

Figure 8F shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF4-1. 
35 Example 27 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 227>: 

1 CCTCGTCGTC CTCGGCATGC TCCAGTTTCA AGGGGCGATT TACTCCAAGG 

51 CGGTGGAACG TATGCTCGGC ACGGTCATCG GGCTGGGCGC GGGTTTGGGC 

101 GTTTTATGGC TGAACCAGCA TTATTTCCAC GGCAACCTCC TCTTCTACCT 

40 151 CACCGTCGGC ACGGCAAGCG CACTGGCCGG CTGGGCGGCG GTCGGCAAAA 

201 ACGGCTACGT CCCTmTGCTG GCAGGGCTGA CGATGTGTAT GCTCATCGGC 

251 GACAACGGCA GCGAATGGCT CGACAGCGGA CTCATGCGCG CCATGAACGT 

301 CCTCATCGGC GyGGCCATCG CCATCGCCGC CGCCAAACTG CTGCCGCTGA 

351 AATCCACACT GATGTGGCGT TTCATGCTTG CCGACAACCT GGCCGACTGC 

45 401 AGCAAAATGA TTGCCGAAAT CAGCAACGGC AGGCGCATGA CCCGCGAACG 

451 CCTCGAGGAG AACATGGCGA AAATGCGCCA AATCAACGCA CGCATGGTCA 

501 AAAGCCGCAG CCATCTCGCC GCCACATCGG GCGAAAGCTG CATCAGCCCC 

551 GCCATGATGG AAGCCATGCA GCACGCCCAC CGTAAAATCG TCAACACCAC 

601 CGAGCTGCTC CTGACCACCG CCGCCAAGCT GCAATCTCCC AAACTCAACG 
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651 GCAGCGAAAT CCGGCTGCTT GACCGCCACT TCACACTGCT CCAAAC 

701 GC AGACACGCCC GCCGCATCCG 

751 CATCGACACC GCCATCAACC CCGAACTGGA AGCCCTCGCC GAACACCTCC 

801 ACTACCAATG GCAGGGCTTC CTCTGGCTCA GCACCGATAT GCGTCAGGAA 

5 851 ATTTCCGCCC TCGTCATCCT GCTGCAACGC ACCCGCCGCA AATGGCTGGA 

901 TGCCCACGAA CGCCAACACC TGCGCCAAAG CCTGCTTGA 

This corresponds to the amino acid sequence <SEQ ID 228; ORF8>: 



1 PRRP RHAPVSRGDL LQGGGTYARH GHRAGRGFGR FMAEPALFPR 

51 QPPLLPHRRH GKRTGRLGGG RQKRLRPXAG RADDVYAHRR QRQRMARQRT 

10 101 HARHERPHRR GHRHRRRQTA AAEIHTDVAF HACRQPGRLQ QNDCRNQQRQ 

151 AHDPRTPRGE HGENAPNQRT HGQKPQPSRR HIGRKLHQPR HDGSHAARPP 

201 XNRQHHRAAP DHRRQAAISQ TQRQRNPAAX PPLHTAPN Q 

251 TRPPHPHRHR HQPRTGSPRR TPPLPMAGLP LAQHRYASGN FRPRHPAATH 

301 PPQMAGCPRT PTPAPKPA* 

15 Computer analysis of this amino acid sequence gave the following results: 
Sequence motifs 

ORF8 is proline-rich and has a distribution of proline residues consistent with a surface 
localization. Furthermore the presence of an RGD motif may indicate a possible role in bacterial 
adhesion events. 



20 Homology with a predicted ORF from Kzonorrhoeae 

ORF8 shows 86.5% identity over a 312aa overlap with a predicted ORF (ORF8.ng) from K 



gonorrhoeae: 





orf 8ng 


1 


25 


orf 8 .pep 


1 




orf 8ng 


51 




orf 8 .pep 


45 


30 






orf 8ng 


101 




orf 8 . pep 


95 


35 


orf 8ng 


151 




orf 8. pep 


145 


40 


orf 8ng 


201 






orf 8 .pep 


195 




orf8ng 


251 


45 


orf 8 .pep 


245 




orf8ng 


301 




orf 8 .pep 


295 



MDRDDRLRRPRHAPVPRRDLLQRGGTYARYGHRAGRGFGRFMAEPALFPR 50 

MINIM I I I II I M I M : M M M I M I M I 1 I I I I I I 
PRRPRHAPVSRGDLLQGGGTYARHGHRAGRGFGR FMAEPALFPR 44 

QPPLLFDHRHGKRTGRLGGGRQKRLRPYVGGADDVHAHRRQRQRMARQRP 100 
M M M I I I I i I 1 I I I I II I I I I I I I I II I : II I I I I I II I I It 

QPPLLPHRRHGKRTGRLGGGRQKRLRPXAGRADDVYAHRRQRQRMARQRT 9 4 

DARDERPHRRRHRHCRRQT AAAE I HT DVAFHACRQPGRLQQN DCRNQQRQ 150 

II I II I I I III I I I I I I I I I I I II I I I II I I I I I I I I II II II I I 

HARHERPHRRGHRHRRRQTAAAE I HTDVAFHACRQPGRMQQN DCRNQQRQ 144 

AYDARTFGAEYGQNAPNQRTHGQKPQPPRRHIGRKPHQPLHDGSHAARPP 200 
Ml M 1:1:11111111! Mill I MM II IM II MM I Ml 

AH D PRT PRGEHGEN APNQRTHGQKPQ P S RRH I GRKLHQ PRH DG S HAAR P P 194 

QNRQHHRAAPDHRRQAAI SQTQRQRNPAARPPLHTAPNRPATNRRPHQRQ 250 

I I I I II I I II II I I I II I I I II I 1 I II I I I I II I M I 

XNRQHHRAAPDHRRQAAISQTQRQRNPAAXPPLHTAPN Q 244 

TRPPHPHRHRHQPRTGSPRRTPPLPMAGFPLAQHQYASGNFRPRHPPATH 300 
I I II I II I I II I I I I I II M I II I II I I II I I I . I I I I I I M I I I III 

TRPPHPHRHRHQPRTGSPRRT PPL PMAGLPLAQHRYASGN FRPRHPAATH 294 



II I I II I II I I I I II I II I 
PPQMAGCPRT PTPAPKPA* 313 

50 The complete length ORF8ng nucleotide sequence <SEQ ED 229> is predicted to encode a protein 
having amino acid sequence <SEQ ID 230>: 



1 MDRDDRLRRP RHAPVPRRDL LQRGGTYARY GHRAGRGFGR FMAEPALFPR 

51 QPPLLPDHRH GKRTGRLGGG RQKRLRPYVG GADDVHAHRR QRQRMARQRP 

101 DARDERPHRR RHRHCRRQTA AAEIHTDVAF HACRQPGRLQ QNDCRNQQRQ 

55 151 AYDARTFGAE YGQNAPNQRT HGQKPQPPRR HIGRKPHQPL HDGSHAARPP 
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201 QNRQHHRAAP DHRRQAAISQ TQRQRNPAAR PPLHTAPNRP ATNRRPHQRQ 
251 TRPPHPHRHR HQPRTGSPRR TPPLPMAGFP LAQHQYASGN FRPRHPPATH 
301 PPQMAGCPRT PTPAPKPA* 

Based on the sequence motifs in these proteins, it is predicted that the proteins from N. meningitidis 
5 and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 28 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 23 1>: 



10 



15 



20 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 



. GAAATCAGCC 
GGATTCGGAA 
GGGCGTGGGT 
CGCGATTTGT 
TGTCCGCATC 
TGCAGGAACA 
GCTTT . GGCA 
CCGCTGGTTC 
TCGTCGTCAG 
GGACATTATC 
AGAATCGCTC 
GTTATCCTTT 



TGCGGTCCGA 
CGTTTTCTGC 
GGAAAACGGC 
CGCCTTTGGG 
GTCGGTTGCG 
GCTCGCCCGA 
TACGCAACCA 
AACGCCTTGG 
TTGCGGCACG 
TCGGAGA.GG 
GCCGTCCGAA 
CCCGACCGG. . 



CNACAGGCCG 
TGTTGGACGG 
ACGTTCGCAA 
CGCGGAGTGG 
CTGTGTGCGG 
AAAATCGAGT 
CTACCGCCAC 
GCAGCCGCCG 
GCGGTAACGG 
AACCATCATG 
CCGCCAACCT 



GTTTCCGTGN 
CGGCAACAGC 
CCGTCGGTAG 
GCGGAAAAGG 
AGAATTCAAA 
GGCTGCCGTC 
CCCGAAGAAC 
CTTCAGCCGC 
TTGACGCGCT 
CCCGGTTTCC 
CAACCGGCAC 



CGAAGCGGCG 
CGGCTCAAGT 
CGCGCCGTAC 
CGGATGGAAA 
AAGGCACAAG 
TTCCGCACAG 
ACGGTTCCGA 
AACGCCTGCG 
CACCGATGAC 
ACCTGATGAA 
GCCGGTAAGC 



25 



This corresponds to the amino acid sequence <SEQ ID 232; ORF61>: 

1 ..EISLRSDXRP VSVXKRRDSE RFLLLDGGNS RLKWAWVENG TFATVGSAPY 

51 RDLSPLGAEW AEKADGNVRI VGCAVCGEFK KAQVQEQLAR KIEWLPSSAQ 

101 AXGIRNHYRH PEEHGSDRWF NALGSRRFSR NACVWSCGT AVTVDALTDD 

151 GHYLGXGTIM PGFHLMKESL AVRTANLNRH AGKRYPFPT . . 

Further work revealed the complete nucleotide sequence <SEQ ID 233>: 



30 



35 



40 



45 



50 



55 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 



ATGACGGTTT 
CGGTTTGCCG 
CGCAGCAGCT 
CTGTTGCGCC 
TTTCGATGCC 
CGGCATTGAA 
GCGCGGATTG 
GCAAAGTAAG 
GCGAGTGTCT 
GAGTTGGGTT 
GTCGCGTTTA 
TCGGACGCGA 
GGCAAAACGG 
GGAAGTAGAA 
GGCGGGGCAA 
CTGGACGCGG 
GGCGGAATAT 
TGCGCGACGG 
CAAGGCGTTT 
CGGCGAAATC 
GGCGGGATTC 
AAGTGGGCGT 
GTACCGCGAT 
GAAATGTCCG 
CAAGTGCAGG 
ACAGGCTTTG 
CCGACCGCTG 
TGCGTCGTCG 
TGACGGACAT 
AAGAATCGCT 
CGTTATCCTT 
GGATGCGGTT 
AAACCGGGGC 



TGAAGCTTTC 
CAACACGTCT 
CAACGGTTTT 
AACACGACGG 
GAAGGTTTGC 
GCACGAGTGC 
CGCCGGACAA 
GGCAGGGGGC 
GATGTTCAGT 
CGCTGTCGCC 
GGTTTGGATG 
CAAATTGGGC 
TTGCCGTGGT 
AATGCCGCTT 
TGCCGATGCC 
TGTTGTTGCA 
CAGGCTGCCA 
CGAAACCGTG 
TGCACTTGGA 
AGCCTGCGGT 
GGAACGTTTT 
GGGTGGAAAA 
TTGTCGCCTT 
CATCGTCGGT 
AACAGCTCGC 
GGCATACGCA 
GTTCAACGCC 
TCAGTTGCGG 
TATCTCGGGG 
CGCCGTCCGA 
TCCCGACCAC 
TGCGGCTCGG 
GGGCAAGCCT 



GCACTGGCGG 
CGCAACTGGC 
TGGCAGCAGA 
CTATTGGCGG 
GCGAGCTGGG 
GCGTCCAGCA 
GGCGCACAAA 
GGCAGGGGCG 
TTTGGCTGGG 
TGTTGCGGCA 
TGCAGATTAA 
GGCATTCTGA 
CGGTATCGGC 
CCGTGCAATC 
GCCGTGCTGC 
ATATGCGCGG 
ACCGCGACCA 
TTCGAAGGCA 
AACGGCAGAG 
CCGACGACAG 
CTGCTGTTGG 
CGGCACGTTC 
TGGGCGCGGA 
TGCGCTGTGT 
CCGAAAAATC 
ACCACTACCG 
TTGGGCAGCC 
CACGGCGGTA 
GAACCATCAT 
ACCGCCAACC 
AACGGGCAAT 
TTATGATGAT 
GTCGATGTCA 



GTGTTGGCGG 
GCGTATGGCG 
TGCCGGCGCA 
CTGGTGCGCC 
GGAAAGGTCG 
ACGACGAGAT 
ACCATATGCG 
GAAGTGGTCG 
TGTTTGACCG 
GTGGCGTGTC 
GTGGCCCAAT 
TTGAAACGGT 
ATCAATTTTG 
GCTGTTTCAG 
TGGAAACGCT 
GACGGATTTG 
CGGCAAGGCG 
CGGTTAAAGG 
GGCAAACAGA 
GCCGGTTTCC 
ACGGCGGCAA 
GCAACCGTCG 
GTGGGCGGAA 
GCGGAGAATT 
GAGTGGCTGC 
CCACCCCGAA 
GCCGCTTCAG 
ACGGTTGACG 
GCCCGGTTTC 
TCAACCGGCA 
GCCGTCGCCA 
GCACGGGCGT 
TCATTACCGG 



AGCTTGCCGA 
GATATGAAGC 
CATACGCGGG 
CATTGGCGGT 
GGTTTTCAGA 
ACTGGAATTG 
TGACCCACCT 
CACCGTTTGG 
GCCGCAGTAT 
GGCGCGCCTT 
GATTTGGTTG 
CAGGACGGGC 
TCCTGCCCAA 
ACGGCATCGC 
GTTGGTGGAA 
CGCCTTTTGT 
GTATTGCTGT 
CGTGGACGGA 
CGGTCGTCAG 
GTGCCGAAGC 
CAGCCGGCTC 
GTAGCGCGCC 
AAGGCGGATG 
CAAAAAGGCA 
CGTCTTCCGC 
GAACACGGTT 
CCGCAACGCC 
CGCTCACCGA 
CACCTGATGA 
CGCCGGTAAG 
GCGGCATGAT 
TTGAAAGAAA 
CGGCGGCGCG 
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1651 GCAAAAGTTG CCGAAGCCCT GCCGCCTGCA TTTTTGGCGG AAAATACCGT 
1701 GCGCGTGGCG GACAACCTCG TCATTTACGG GTTGTTGAAC ATGATTGCCG 
1751 CCGAAGGCAG GGAATATGAA CATATTTAA 

This corresponds to the amino acid sequence <SEQ ID 234; ORF61-l>: 



1 MTVLKLSHWR VLAELADGLP QHVSQLARMA DMKPQQLNGF WQQMPAHIRG 

51 LLRQHDGYWR LVRPLAVFDA EGLRELGERS GFQTALKHEC ASSNDEILEL 

101 ARIAPDKAHK TICVTHLQSK GRGRQGRKWS HRLGECLMFS FGWVFDRPQY 

151 ELGSLSPVAA VACRRALSRL GLDVQIKWPN DLWGRDKLG GILIETVRTG 

201 GKTVAWGIG INFVLPKEVE NAASVQSLFQ TASRRGNADA AVLLETLLVE 

251 LDAVLLQYAR DGFAPFVAEY QAANRDHGKA VLLLRDGETV FEGTVKGVDG 

301 QGVLHLETAE GKQTWSGEI SLRSDDRPVS VPKRRDSERF LLLDGGNSRL 

351 KWAWVENGTF ATVGSAPYRD LSPLGAEWAE KADGNVRIVG CAVCGEFKKA 

401 QVQEQLARKI EWLPSSAQAL GIRNHYRHPE EHGSDRWFNA LGSRRFSRNA 

451 CVWSCGTAV TVDALT DDGH YLGGTIMPGF HLMKESLAVR TANLNRHAGK 

501 RYPFPTTTGN AVASGMMDAV CGSVMMMHGR LKEKTGAGKP VDVIITGGGA 

551 AKVAEALPPA FLAENTVRVA DNLVIYGLLN MIAAEGREYE HI* 

Figure 9 shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF61-1. Further 
computer analysis of this amino acid sequence gave the following results: 



Homology with the baf protein of B. pertussis (accession number Ul 2020V 
ORF61 and baf protein show 33% aa identity in 166aa overlap: 

orf61 23 LLLDGGNSRLKWAWVE-NGTFATVGSAPYR DLSPLGAEWAEKADGNVRIVGCAVCG 77 

+L+D GNSRLK W + + A AP DL LG A R +G V G 

baf 3 ILIDSGNSRLKVGWFDPDAPQAAREPAPVAFDNLDLDALGRWLATLPRRPQRALGVNVAG 62 



orf61 78 EFKKAQVQEQLAR KIEWLPSSAQAXGIRNHYRHPEEHGSDRW FNALGSRRFSRN 131 

+ + L I WL + A G+RN YR+P++ G+DRW L + 

baf 63 LARGEAIAATLRAGGCDIRWLRAQPLAMGLRNGYRNPDQLGADRWACMVGVLARQPSVHP 122 

orf61 132 ACVWSCGTAVTVDALTDDGHYLGXGTIMPGFHLMKESLAVRTANL 177 

+V S GTA T+D + D + G G I+PG +M+ +LA TA+L 
baf 123 PLLVASFGTATTLDTIGPDNVFPG-GLILPGPAMMRGALAYGTAHL 167 



Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF61 shows 97.4% identity over a 189aa overlap with an ORF (ORF61a) from strain A of N. 
meningitidis: 



10 20 30 

orf 61 .pep E I S LRS DXRPVS VXKRRDSERF.LLLDGGN S 

I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I 
orf 61a TVFEGTVKGVDGQGVLHLETAEGKQTWSGEISLRSDDRPVSVPKRRDSERFLLLDGGNS 
290 300 310 320 330 340 



40 50 60 70 80 90 

orf 61 . pep RLKWAWVENGTFATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGEFKKAQVQEQLAR 
IMI I MM I II II MM I I MM II II I M 11:1 M II M I I I M Ml I II I II III! I 
orf 61a RLKWAWVENGTFATVGSAPYRDLSPLGAEWAEKVDGNVRIVGCAVCGEFKKAQVQEQLAR 
350 360 370 380 390 400 



100 110 120 130 140 150 

orf 61 . pep KIEWLPSSAQAXGIRNHYRHPEEHGSDRWFNALGSRRFSRN ACWVSCGTAVTVDALT DD 
I M M M M M M M M M M M M M M M M M M I M M M M M M M M M M I 
orf 61a KIEWLPSSAQALGIRNHYRHPEEHGSDRWFNALGSRRFSRN ACVWSCGTAVTVDALT DD 
410 420 430 440 450 460 



160 170 180 189 

orf 61 . pep GH YLGXGT IM PG FHLMKE S LAVRT ANLNRHAGKR Y P F PT 
MM! M M M M M M M M M M M I M M M M M 
orf 61a GHYLG-GTIMPGFHLMKESLAVRTANLNRHAGKRYPFPTTTGNAVASGMMDAVCGSVMMM 
470 480 490 500 510 520 



orf 61a 



HGRLKEKTGAGKPVDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIHGLLNLIAAEGG 
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530 540 550 560 570 

The complete length ORF61a nucleotide sequence <SEQ ID 235> is: 



580 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 



ATGACGGTTT 
CGGTTTGCCG 
CGCAGCAGCT 
CTGTTGCGCC 
TTTCGATGCC 
CGGCATTGAA 
GCGCGGATTG 
GCAAAGTAAG 
GCGAGTGTCT 
GAGTTGGGTT 
GTCGCGTTTG 
TCGGACGCGA 
GGCAAAACGG 
GGAAGTGGAA 
GGCGGGGAAA 
CTTGATGCGG 
GGCGGAATAT 
TGCGCGACGG 
CAAGGCGTTC 
CGGCGAAATC 
GGCGGGATTC 
AAGTGGGCGT 
GTACCGCGAT 
GAAATGTCCG 
CAAGTGCAGG 
ACAGGCTTTG 
CCGACCGCTG 
TGCGTCGTCG 
TGACGGACAT 
AAGAATCGCT 
CGTTATCCTT 
GGATGCGGTT 
AAACCGGGGC 
GCAAAAGTTG 
GCGCGTGGCG 
CCGAAGGCGG 



TGAAGCCTTC 
CAACACGTCT 
CAACGGTTTT 
AACACGACGG 
GAAGGTTTGC 
GCACGAGTGC 
CGCCGGACAA 
GGCAGGGGGC 
GATGTTCAGT 
CGCTGTCGCC 
GGTTTGAAAA 
CAAATTGGGC 
TTGCCGTGGT 
AACGCCGCTT 
TGCCGATGCC 
TGTTGTTGCA 
CAGGCTGCCA 
CGAAACCGTG 
TGCACTTGGA 
AGCCTGCGGT 
GGAACGTTTT 
GGGTGGAAAA 
TTGTCGCCTT 
CATCGTCGGT 
AACAGCTCGC 
GGCATACGCA 
GTTCAACGCC 
TCAGTTGCGG 
TATCTCGGGG 
CGCCGTCCGA 
TCCCGACCAC 
TGCGGCTCGG 
GGGCAAGCCT 
CCGAAGCCCT 
GACAACCTCG 
GGAATCGGAA 



GCACTGGCGG 
CGCAACTGGC 
TGGCAGCAGA 
CTATTGGCGG 
GCGAGCTGGG 
GCGTCCAGCA 
GGCGCACAAA 
GGCAGGGGCG 
TTTGGCTGGG 
TGTTGCGGCA 
CGCAAATCAA 
GGCATTCTGA 
CGGTATCGGC 
CCGTGCAATC 
GCCGTGTTGC 
ATATGCGCGG 
ACCGCGACCA 
TTCGAAGGCA 
AACGGCAGAG 
CCGACGACAG 
CTGCTGTTGG 
CGGCACGTTC 
TGGGCGCGGA 
TGCGCCGTGT 
CCGAAAAATC 
ACCACTACCG 
TTGGGCAGCC 
CACGGCGGTA 
GAACCATCAT 
ACCGCCAACC 
AACGGGCAAT 
TTATGATGAT 
GTCGATGTCA 
GCCGCCTGCA 
TCATTCACGG 
CATACTTAA 



GTGTTGGCGG 
GCGTATGGCG 
TGCCGGCGCA 
CTGGTGCGCC 
GGAAAGGTCG 
ACGACGAGAT 
ACCATATGTG 
GAAGTGGTCG 
TGTTTGACCG 
GTGGCGTGCC 
GTGGCCAAAC 
TTGAAACGGT 
ATCAATTTCG 
GCTGTTTCAG 
TGGAAACGCT 
GACGGATTTG 
CGGCAAGGCG 
CGGTTAAAGG 
GGCAAACAGA 
GCCGGTTTCC 
ACGGCGGCAA 
GCAACCGTCG 
GTGGGCGGAA 
GCGGAGAATT 
GAGTGGCTGC 
CCACCCCGAA 
GCCGCTTCAG 
ACGGTTGACG 
GCCCGGTTTC 
TCAACCGGCA 
GCCGTCGCCA 
GCACGGGCGT 
TCATTACCGG 
TTTTTGGCGG 
GCTGCTGAAC 



AGCTTGCCGA 
GATATGAAGC 
CATACGCGGG 
CATTGGCGGT 
GGTTTTCAGA 
ACTGGAATTG 
TGACCCACCT 
CACCGTTTGG 
GCCGCAGTAT 
GGCGCGCCTT 
GATTTGGTCG 
CAGGACGGGC 
TGCTGCCCAA 
ACGGCATCGC 
GTTGGCGGAA 
CGCCTTTTGT 
GTATTGCTGT 
CGTGGACGGA 
CGGTCGTCAG 
GTGCCGAAGC 
CAGCCGGCTC 
GTAGCGCGCC 
AAGGTGGATG 
CAAAAAGGCA 
CGTCTTCCGC 
GAACACGGTT 
CCGCAACGCC 
CGCTCACCGA 
CACCTGATGA 
CGCCGGTAAG 
GCGGCATGAT 
TTGAAAGAAA 
CGGCGGCGCG 
AAAATACCGT 
CTGATTGCCG 



This encodes a protein having amino acid sequence <SEQ ID 236>: 



1 MTVLKPSHWR VLAELADGLP QHVSQLARMA DMKPQQLNGF WQQMPAHIRG 

51 LLRQHDGYWR LVRPLAVFDA EGLRELGERS GFQTALKHEC ASSNDEILEL 

101 ARIAPDKAHK TICVTHLQSK GRGRQGRKWS HRLGECLMFS FGWVFDRPQY 

151 ELGSLSPVAA VACRRALSRL GLKTQIKWPN DLWGRDKLG GILIETVRTG 

201 GKTVAWGIG INFVLPKEVE NAASVQSLFQ TASRRGNADA AVLLETLLAE 

251 LDAVLLQYAR DGFAPFVAEY QAANRDHGKA VLLLRDGETV FEGTVKGVDG 

301 QGVLHLETAE GKQTWSGEI SLRSDDRPVS VPKRRDSERF LLLDGGNSRL 

351 KWAWVENGTF ATVGSAPYRD LSPLGAEWAE KVDGNVRIVG CAVCGEFKKA 

401 QVQEQLARKI EWLPSSAQAL GIRNHYRHPE EHGSDRWFNA LGSRRFSRNA 

451 CVWSCGTAV TVDALT DDGH YLGGTIMPGF HLMKESLAVR TANLNRHAGK 

501 RYPFPTTTGN AVASGMMDAV CGSVMMMHGR LKEKTGAGKP VDVIITGGGA 

551 AKVAEALPPA FLAENTVRVA DNLVIHGLLN LIAAEGGESE HT* 

ORF61a and ORF61-1 show 98.5% identity in 591 aa overlap: 

10 20 30 40 50 60 

MTVLKPSHWRVLAELADGLPQHVSQIJUU^DMKPQQLNGFWQQMPAHIRGLLRQHDGYWR 

Mill I I I I I i I I II I I I I I! I I I I i I I I I I I f I I I I I I I I I i I 1 I I I 1 I I I I I I I I I I 
MTVLKLSHWRVLAELADGLPQHVSQLARMADMKPQQLNGFWQQM PAH I RG LLRQHDGYWR 
10 20 30 40 50 60 



orf 61a. pep 
orf61-l 



70 80 90 100 110 120 

orf 61a pep LVRPLAVFDAEGLRELGERSGFQTALKHECASSNDEILELARIAPDKAHKTICVTHLQSK 
I | I | | | | | | || I I I I I I I I I I I I I I I I I I I I I I I t I I I I I I I I I I I I I I ! I I I I I I I I I I 
orf 61-1 LVRPLAVFDAEGLRELGERSGFQTALKHECASSNDEILELARIAPDKAHKTICVTHLQSK 

70 80 90 100 110 120 



130 140 150 160 170 180 
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10 



15 



20 



25 



30 



35 



40 



45 



orf 61a . pep GRGRQGRKWSHRLGECLMFSFGWFDRPQYELGSLSPVAAVACRRALSRLGLKTQIKWPN 
I I I I I I I I I I It I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I : I I I i I I 
orf 61-1 GRGRQGRKWSHRLGECLMFSFGWVFDRPQYELGSLSPVAAVACRRALSRLGLDVQIKWPN 

130 140 150 160 170 180 

190 200 210 220 230 240 

or f 61a . pep DLWGRDKLGG I L I ETVRTGGKTVAWG I G IN FVL PKEVENAAS VQSLFQTASRRGNADA 

I M I I I I I I II I I I I I I I I I I I I I I I I II I I M I I I I I I I I I II I I I I I I I I I I I I I I I I 
orf 61-1 DLWGRDKLGG I LIETVRTGGKTVAWGIGINFVLPKEVENAAS VQSLFQTASRRGNADA 

190 200 210 220 230 240 

250 260 270 280 290 300 

or f 61a . pep AVLLETLLAELDAVLLQYARDGFAPFVAEYQAANRDHGKAVLLLRDGETVFEGTVKGVDG 
I I I I I I I I : I I I I I I I II I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 61-1 AVLLETLLVELDAVLLQYARDGFAPFVAEYQAANRDHGKAVLLLRDGETVFEGTVKGVDG 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 61a . pep QGVLHLETAEGKQTWSGEISLRSDDRPVSVPKRRDSERFLLLDGGNSRLKWAWVENGTF 
I I I I I I 1 I J I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 61-1 QGVLHLETAEGKQTWSGEISLRSDDRPVSVPKRRDSERFLLLDGGNSRLKWAWVENGTF 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 61a . pep ATVGSAPYRDLSPLGAEWAEKVDGNVRIVGCAVCGEFKKAQVQEQLARKIEWLPSSAQAL 
I I I I I I I I I I I I I I I I I I II I : I I I I I I I I I I I I II I I I I I I I I I I I I II I II I I I I I I I 
orf 61-1 ATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGEFKKAQVQEQLARKIEWLPSSAQAL 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 61a . pep GIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDDGHYLGGTIMPGF 
I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 
orf 61-1 GIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDDGHYLGGTIMPGF 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf 61a , pep HLMKESLAVRTANLNRHAGKRYPFPTTTGNAVASGMMDAVCGSVMMMHGRLKEKTGAGKP 
I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I 
orf 61-1 HLMKESLAVRTANLNRHAGKRYPFPTTTGNAVASGMMDAVCGSVMMMHGRLKEKTGAGKP 

490 500 510 520 530 540 

550 560 570 580 590 

orf 61a . pep VDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIHGLLNLIAAEGGESEHTX 

I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I : I I H : I I I II I II 
orf 61-1 VDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIYGLLNMIAAEGREYEHIX 

550 560 570 580 590 



50 



Homology with a predicted ORF from N. gonorrhoeae 

ORF61 shows 94.2% identity over a 189aa overlap with a predicted ORF (ORF61.ng) from N. 
gonorrhoeae: 



55 



60 



65 



orf 61. pep 
orf 61ng 
orf 61 .pep 
orf 61ng 
orf 61. pep 
orf 61ng 
orf 61 .pep 
orf 61ng 



EISLRSDXRPVSVXKRRDSERFLLLDGGNS 30 
I I I I I I I III II I I I I I I I 1 : I I I I 

TVCEGTVKGVDGRGVLHLETAEGEQTWSGEISLRPDNRSVSVPKRPDSERFLLLEGGNS 211 

RLKWAWVENGTFATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGEFECKAQVQEQLAR 90 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I : I I I I I 

RLKWAW VENGT FAT VGS AP YRDLS PLGAEWAEKADGN VR I VGCAVCGE SKKAQVKEQLAR 271 

KIEWLPSSAQAXGIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDD 150 

IIIIIIIIMI ! I I I II I M I I I I I II I M II I I I I I I I I I i I II I I I I I I I I I I M I I 

KIEWLPSSAQALGIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDD 331 

GHYLGXGTIMPGFHLMKESLAVRTANLNRHAGKRYPFPT 189 
I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

GHYLG-GTIMPGFHLMKESLAVRTANLNRPAGKRYPFPTTTGNAVASGMMDAVCGSIMMM 390 
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An ORF61ng nucleotide sequence <SEQ ID 237> was predicted to encode a protein having amino 
acid sequence <SEQ ID 23 8>: 

1 MFSFGWAFDR PQYEL GSLSP VAALACRRAL GCLGLETQIK WPNDLWGRD 

51 KLGGILIETV RAGGKTVAVV GIGINFVLPK EVENAASVQS LFQTASRRGN 

101 ADAAVLLETL LAELGAVLEQ YAEEGFAPFL NEYETANRDH GKAVLLLRDG 

151 ETVCEGTVKG VDGRGVLHLE TAEGEQTVVS GEISLRPDNR SVSVPKRPDS 

201 ERFLLLEGGN SRLKWAWVEN GTFATVGSAP YRDLSPLGAE WAEKADGNVR 

251 IVGCAVCGES KKAQVKEQLA RKIEWLPSSA QALGIRNHYR HPEEHGSDRW 

301 FNALGSRRFS RNACVWSCG TAVTVDALTD DGHYLGGTIM PGFHLMKESL 

351 AVRTANLNRP AGKRYPFPTT TGNAVASGMM DAVCGS IMMM HGRLKEKNGA 

401 GKPVDVIITG GGAAKVAEAL PPAFLAENTV RVADNLVIHG LLNLIAAEGG 

451 ESEHA* 

Further analysis revealed the complete gonococcal DNA sequence <SEQ ID 239> to be: 

1 ATGACGGTTT TGAAGCCTTC GCATTGGCGG GTGTTGGCGG AGCTTGCCGA 

51 CGGTTTGCCG CAACACGTAT CGCAATTGGC GCGTGAGGCG GACATGAAGC 

101 CGCAGCAGCT CAACGGTTTT TGGCAGCAGA TGCCGGCGCA TATACGCGGG 

151 CTGTTGCGCC AACACGACGG CTATTGGCGG CTGGTGCGCC CCTTGGCGGT 

201 TTTCGATGCC GAAGGTTTGC GCGATCTGGG GGAAAGGTCG GGTTTTCAGA 

251 CGGCATTGAA GCACGAGTGC GCGTCCAGCA ACGACGAGAT ACTGGAATTG 

301 GCGCGGATTG CGCCGGACAA GGCGCACAAA ACCATATGCG TGACCCACCT 

351 GCAAAGTAAG GGCAGGGGGC GGCAGGGGCG GAAGTGGTCG CACCGTTTGG 

401 GCGAGTGCCT GATGTTCAGT TTCGGCTGGG CGTTTGACCG GCCGCAGTAT 

451 GAGTTGGGTT CGCTGTCGCC TGTTGCGGCA CTTGCGTGCC GGCGCGCTTT 

501 GGGGTGTTTG GGTTTGGAAA CGCAAATCAA GTGGCCAAAC GATTTGGTCG 

551 TCGGACGCGA CAAATTGGGC GGCATTCTGA TTGAAACAGT CAGGGCGGGC 

601 GGTAAAACGG TTGCCGTGGT CGGTATCGGC ATCAATTTCG TGCTGCCCAA 

651 GGAAGTGGAA AACGCCGCTT CCGTGCAGTC GCTGTTTCAG ACGGCATCGC 

701 GGCGGGGCAA TGCCGATGCC GCCGTATTGC TGGAAACATT GCTTGCGGAA 

751 CTGGGCGCGG TGTTGGAACA ATATGCGGAA GAAGGGTTCG CGCCATTTTT 

801 AAATGAGTAT GAAACGGCCA ACCGCGACCA CGGCAAGGCG GTATTGCTGT 

851 TGCGCGACGG CGAAACCGTG TGCGAAGGCA CGGTTAAAGG CGTGGACGGA 

901 CGAGGCGTTC TGCACTTGGA AACGGCAgaa ggcgaACAGa cggtcgtcag 

951 cggcgaaaTC AGcctGCggc ccgacaacaG GTCGGtttcc gtgccgaagc 

1001 ggccggatTC GgaacgtTTT tTGCtgttgg aaggcgggaa cagccgGCTC 

1051 AAGTGGGCGT GggtggAAAa cggcacgttc gcaaccgtgg gcagcgcgCc 

1101 gtaCCGCGAT TTGTCGCCTT TGGGCGCGGA GTGGGCGGAA AAGGCGGATG 

1151 GAAATGTCCG CATCGTCGGT TGCGCCGTGT GCGGAGAATC CAAAAAGGCA 

1201 CAAGTGAAGG AACAGCTCGC CCGAAAAATC GAGTGGCTGC CGTCTTCCGC 

1251 ACAGGCTTTG GGCATACGCA ACCACTACCG CCACCCCGAA GAACACGGTT 

1301 CCGACCGTTG GTTCAACGCC TTGGGCAGCC GCCGCTTCAG CCGCAACGCC 

1351 TGCGTCGTCG TCAGTTGCGG CACGGCGGTA ACGGTTGACG CGCTCACCGA 

1401 TGACGGACAT TATCTCGGCG GAACCATCAT GCCCGGCTTC CACCTGATGA 

1451 AAGAATCGCT CGCCGTCCGA ACCGCCAACC TCAACCGCCC CGCCGGCAAA 

1501 CGTTACCCTT TCCCGACCAC AACGGGCAAC GCCGTCGCAA GCGGCATGAT 

1551 GGACGCGGTT TGCGGCTCGA TAATGATGAT GCACGGCCGT TTGAAAGAAA 

1601 AAAACGGCGC GGGCAAGCCT GTCGATGTCA TCATTACCGG CGGCGGCGCG 

1651 GCGAAAGTCG CCGAAGCCCT GCCGCCTGCA TTTTTGGCGG AAAATACCGT 

1701 GCGCGTGGCG GACAACCTCG TCATCCACGG GCTGCTGAAC CTGATTGCCG 

1751 CCGAAGGCGG GGAATCGGAA CACGCTTAA 

This corresponds to the amino acid sequence <SEQ ID 240; ORF61ng-l>: 

1 MTVLKPSHWR VLAELADGLP QHVSQLAREA . DMKPQQLNGF WQQMPAHIRG 

51 LLRQHDGYWR LVRPLAVFDA EGLRDLGERS GFQTALKHEC ASSNDEILEL 

101 ARIAPDKAHK TICVTHLQSK GRGRQGRKWS HRLGECLMFS FGWAFDRPQY 

151 ELGSLSPVAA LACRRALGCL GLETQIKWPN DLWGRDKLG GILIETVRAG 

201 GKTVAWGIG INFVLPKEVE NAASVQSLFQ TASRRGNADA AVLLETLLAE 

251 LGAVLEQYAE EGFAPFLNEY ETANRDHGKA VLLLRDGETV CEGTVKGVDG 

301 RGVLHLETAE GEQTWSGEI SLRPDNRSVS VPKRPDSERF LLLEGGNSRL 

351 KWAWVENGTF ATVGSAPYRD LSPLGAEWAE KADGNVRIVG CAVCGESKKA 

401 QVKEQLARKI EWLPSSAQAL GIRNHYRHPE EHGSDRWFNA LGSRRFSRNA 

451 CWVSCGTAV TVDALT DDGH YLGGTIMPGF HLMKESLAVR TANLNRPAGK 

501 RYPFPTTTGN AVASGMMDAV CGSIMMMHGR LKEKNGAGKP VDVIITGGGA 

551 AKVAEALPPA FLAENTVRVA DNLVIHGLLN LIAAEGGESE HA* 



ORF61ng-l and ORF61-1 show 93.9% identity in 591 aa overlap: 
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orf 61ng-l . pep MTVLKPSHWRVLAELADGLPQHVSQLAREADMKPQQLNGFWQQMPAHIRGLLRQHDGYWR 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 61-1 MTVLKLSHWRVLAELADGLPQHVSQLARMADMKPQQLNGFWQQMPAHIRGLLRQHDGYWR 60 

orf 61ng-l .pep LVRPLAVFDAEGLRDLGERSGFQTALKHECASSNDEILELARIAPDKAHKTICVTHLQSK 120 

I I I I I I I I I I I I I I : I 1 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 61-1 LVRPLAVFDAEGLRELGERSGFQTALKHECASSNDEILELARIAPDKAHKTICVTHLQSK 120 

orf 61ng-l .pep GRGRQGRKWSHRLGECLMFSFGWAFDRPQYELGSLSPVAALACRRALGCLGLETQIKWPN 180 

I I M I I I I I I I I i M I I I I I I I I : I I I I I I I I I I I! I M I : ! I 1 I I I : 
orf 61-1 GRGRQGRKW S HRLGE C LM FS FGWVFDR PQ YE LG S L S PVAAVACRRAL S RLGLDVQ I KW PN 180 

orf 61ng-l .pep DLVVGRDKLGGILIETVRAGGKTVAWGIGINFVLPKEVENAASVQSLFQTASRRGNADA 240 

I MMMIII I I II II 11:11 Ml I I Mill I M I ! Ill II I I II I I I I I II II I II M I 
orf 61-1 DLWGRDKLGGILIETVRTGGKTVAWGIGINFVLPKEVENAASVQSLFQTASRRGNADA 240 

orf 61ng-l .pep AVLLETLLAELGAVLEQYAEEGFAPFLNEYETANRDHGKAVLLLRDGETVCEGTVKGVDG 300 

lllllllhll Mi I I I : : M I I I : I I :: I I I I I I I I I I I I I I I I I I MMMMI 
orf 61-1 AVLLETLLVELDAVLLQYARDGFAPFVAEYQAANRDHGKAVLLLRDGETVFEGTVKGVDG 300 

orf 61ng-l .pep RGVLHLETAEGEQTWSGEISLRPDNRSVSVPKRPDSERFLLLEGGNSRLKWAWVENGTF 360 

:|||lllll!l:lllllllllll 1:1 MMM I I I I I I I I : I I I I I I II I I I I I I I I 
orf 61-1 QGVLHLETAEGKQTWSGEISLRSDDRPVSVPKRRDSERFLLLDGGNSRLKWAWVENGTF 360 

orf 61ng-l .pep ATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGESKKAQVKEQLARKIEWLPSSAQAL 420 

M II II M M M II II II M M M M M M M I M I I I II I : I II I I I I I I I I I II II I 
orf 61-1 ATVGS APYRDLS PLGAEWAEKADGNVRI VGCAVCGE FKKAQVQEQLARKI EWLP S SAQAL 420 

orf 61ng-l .pep GIRNHYRHPEEHGSDRWFNALGSRRFSRNACWVSCGTAVTVDALTDDGHYLGGTIMPGF 480 

I I II I I II I II I I I II I II I I I II II I I I I II I I I II I I I I I I II I I I I I II I I I I I II I 
orf 61-1 GIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDDGHYLGGTIMPGF 480 

orf 61ng-l .pep HLMKESLAVRTANLNRPAGKRYPFPTTTGNAVASGMMDAVCGSIMMMHGRLKEKNGAGKP 540 

I 1 I I 11 II I I I I I I I I I I I I I II I I I I I I II II I II II II I I : I I M I II I II : I I M I 
or f 61-1 HLMKESLAVRTANLNRHAGKRYPFPTTTGNAVASGMMDAVCGSVMMMHGRLKEKTGAGKP 540 

orf 61ng-l .pep VDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIHGLLNLIAAEGGESEHAX 593 

I I I I I II I I II I I I I I II II I I I II I I I I II I I II : I I II : I I I II I I I 
orf 61-1 V DV 1 1 TGGGAAKVAE AL P P AFLAENT VR VADN LVI YGLLNM I AAEGRE YEH I X 593 



Based on this analysis, including the homology with the baf protein of B.pertussis and the presence 
of a putative prokaryotic membrane lipoprotein lipid attachment site, it is predicted that these 
proteins from N. meningitidis and N.gonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 



Example 29 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 241>: 

1 ATGTTTTACC AAATCCTTGC CCTGATTATC TGGAGCAGCT CGTTTATTGC 

51 CGCCAAATAT GTCTATGGCG GCATCGATCC CGCATTGATG GTCGGCGTGC 

101 GCCTGCTAAT TGCCGCGCTG CCTGCACTGC CCGCCTGCCG CCGTCATGTC 

151 GGCAAGATTC CGCGTGAGGA ATGGAAGCCG TTGCTGATTG TGTCGTTCGT 

201 CAACTATGTG CTGACCCTGC TGCTTCAGTT TGTCGGGTTG AAATACACTT 

251 CCGCCGCCAG CGCATCGGTC ATTGTCGGAC TCGAGCCGCT GCTGATGGTG 

301 TTTGTCGGAC ACTTTTTCTT CAACGACAAA GCGCGTGCCT ACCACTGGAT 

351 ATGCGGCGCG GCGGCATTTG CCGGTGTCGC GCTGCTGATG GCGGGCGGTG 

401 CGGaAGAGGG CGGCGaAGTC GGCTGGTTCG GCTGCCTGCT GGTGTTGTTG 

451 GCGGGCGCGG GCTTTTGTGC CGCTATGCGT CCGACGCAAA GGCTGATTGC 

501 ACGCATCGGC GCACCGGCAT TCACATCTGT TTCCATTGCC GCCGCATCGT 

551 TGATGTGCCT GCCGTTTTCG CTTGCTTTGG CGCAAAGTTA TACCGTGGAC 

601 TGGAGCGTCG GGATGGTATT GTCGCTGCTG TATTTGGGTT TGGGGTGC. . 



This corresponds to the amino acid sequence <SEQ ID 242; ORF62>: 



WO 99/24578 



-177- 



PCT/IB98/01665 



1 MFYQILALII WSSSFIAAKY VYGGIDPALM VGVRLLIAAL PALPACRRHV 

51 GKIPREEWKP LLIVSFVNYV LTLLLQFVGL KYTSAASASV IVGLEPLLMV 

101 FVGHFFFNDK ARAYHWICGA AAFAGVALLM AGGAEEGGEV GWFGCLLVLL 

151 AGAGFCAAMR PTQRLIARIG APAFTSVSIA AASLMCLPFS LALAQSYTVD 

201 WSVGMVLSLL YLGLGC . 

Further work revealed the complete nucleotide sequence <SEQ ID 243>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGTTTTACC 
CGCCAAATAT 
GCCTGCTAAT 
GGCAAGATTC 
CAACTATGTG 
CCGCCGCCAG 
TTTGTCGGAC 
ATGCGGCGCG 
CGGAAGAGGG 
GCGGGCGCGG 
ACGCATCGGC 
TGATGTGCCT 
TGGAGCGTCG 
CTGGTACGCC 
ATGTTTCGGG 
GCGGTTTTGA 
GTTTGTCGTC 
AATAA 



AAATCCTTGC 
GTCTATGGCG 
TGCCGCGCTG 
CGCGTGAGGA 
CTGACCCTGC 
CGCATCGGTC 
ACTTTTTCTT 
GCGGCATTTG 
CGGCGAAGTC 
GCTTTTGTGC 
GCACCGGCAT 
GCCGTTTTCG 
GGATGGTATT 
TATTGGCTGT 
ACTGTTGATT 
TTTTGGGCGA 
ATCGCCGCCA 



CCTGATTATC 
GCATCGATCC 
CCTGCACTGC 
ATGGAAGCCG 
TGCTTCAGTT 
ATTGTCGGAC 
CAACGACAAA 
CCGGTGTCGC 
GGCTGGTTCG 
CGCTATGCGT 
TCACATCTGT 
CTTGCTTTGG 
GTCGCTGCTG 
GGAACAAGGG 
TCGCTCGAAC 
ACACCTGTCG 
CCTTGGTTGC 



TGGAGCAGCT 
CGCATTGATG 
CCGCCTGCCG 
TTGCTGATTG 
TGTCGGGTTG 
TCGAGCCGCT 
GCGCGTGCCT 
GCTGCTGATG 
GCTGCCTGCT 
CCGACGCAAA 
TTCCATTGCC 
CGCAAAGTTA 
TATTTGGGTT 
GATGAGCCGT 
CCGTCGTCGG 
CCCGTGTCCG 
CGGCCGGCTG 



CGTTTATTGC 
GTCGGCGTGC 
CCGTCATGTC 
TGTCGTTCGT 
AAATACACTT 
GCTGATGGTG 
ACCACTGGAT 
GCGGGCGGTG 
GGTGTTGTTG 
GGCTGATTGC 
GCCGCATCGT 
TACCGTGGAC 
TGGGGTGCGG 
GTTCCTGCCA 
CGTGCTGCTG 
CCTTGGGCGT 
TCGCATCAAA 



This corresponds to the amino acid sequence <SEQ ID 244; ORF62-l>: 



1 MFYQILALII WSSSFIA AKY VYGGID PALM VGVRLLIAAL PAL PACRRHV 

51 GKIPREEWKP L LIVSFVNYV LTLLLQFV GL KYTSA ASASV IVGLEPLLMV 

101 FVGHFFFNDK ARAYHW ICGA AAFAGVALLM AGG AEEGGEV GW FGCLLVLL 

151 AGAGFCAAM R PTQRLIARIG APAFTS VSIA AASLMCLPFS LALA QSYTVD 

201 WSVGMVLSLL YLGLGCGWYA YWLWNKGMSR VPANVSG LLI SLEPWGVLL 

251 AVLILGEHLS PVSALGVFW IAATLVAGRL SHQK* 



Computer analysis of this amino acid sequence gave the following results: 



Homology with hypothetical transmembrane protein HI0976 of H. influenzae (accession number 057147) 
ORF62 and HI0976 show 50% aa identity in 1 14aa overlap: 

0rf62 1 MFYQI LAL 1 1 WS S S FI AAK YVYGGI DPALMVGVRXXXXXXXXXXXCRRHVGKI PREEWKP 6 0 

M YQILAL+IWSSS IKY +DP L+V VR R KI + K 

HI097 6 1 MLYQILALLIWSSSLIVGKLTYSMMDPVLWQVRLIIAMIIVMPLFLRRWKKIDKPMRKQ 60 

0rf62 61 LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAY 114 

L ++F NY LLQ F+GLKYT S A+ S A ++GLEPLL+VFVGHFFF K + 
HI0976 61 LWWLAFFNYTAVFLLQFIGLKYTSASSAVTMIGLEPLLWFVGHFFFKTKQNGF 114 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF62 shows 99.5% identity over a 216aa overlap with an ORF (ORF62a) from strain A of N. 



meningitidis: 



10 20 30 40 50 60 

orf 62 . pep MFYQILALIIWSSSFIA AKYVYGGID PALMVGVRLLIAALPAL PACRRHVGKIPREEWKP 

I I I I I I I I I I I I i I I 1 I I I I I I I I I I I I I I I II I I I I I I II I I I I II I I I I I I M I I ! I I 
o r f 6 2 a MFYQI LAL I IWS S S FI A AKYVYGG I D P ALMVG VRLL I AAL PAL P ACRRH VGK I PRE EWKP 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 62 . pep L LIVSFVNYVLTLLLQFV GLKYTSA ASASVIVGLEPLLMVFV GHFFFNDKARAYHW ICGA 
I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I M I 1 I I I I I I I I I I I I I II 
orf 62a L LIVSFVNYVLTLLLQFV GLKYTSA ASASVIVGLEPLLMVFV GHFFFNDKARAYHW ICGA 

70 80 90 100 110 120 



orf 62 .pep 



130 140 150 160 170 180 

AAFAGVALLMAGG AEEGGEVGW FGCLLVLLAGAGFCAAM RPTQRLIARIGAPAFTS VSIA 
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I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf62a AAFAGVALLMAGG AEEGGEVGW FGCLLVLLAGAGFCAAM RPTQRLIARIGAPAFTS VSIA 

130 140 150 160 170 180 



190 200 210 

orf 62 . pep AAS LMCLP FS LALA Q S YT VDWS VGMVLS LLYLGLGC 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i : I I 
orf 62a AASLMCLPFSLAL AQSYTVDWSVGMVLSLLYLGVGCSWYAYWLWNKGMSRVPANVSGLLI 

190 200 210 220 230 240 



orf 62a SLEPWGVLLAVLI LGEHLSPVSVLGVFWIAATLVAGRLSHQKX 
250 260 270 280 

The complete length ORF62a nucleotide sequence <SEQ ID 245> is: 



1 ATGTTTTACC AAATCCTTGC CCTGATTATC TGGAGCAGCT CGTTTATTGC 

51 CGCCAAATAT GTCTATGGCG GCATCGATCC CGCATTGATG GTCGGCGTGC 

101 GCCTGCTGAT TGCTGCGCTG CCTGCACTGC CCGCCTGCCG CCGTCATGTC 

151 GGCAAGATTC CGCGTGAGGA ATGGAAGCCG TTGCTGATTG TGTCGTTCGT 

201 CAACTATGTG CTGACCCTGC TACTTCAGTT TGTCGGGTTG AAATACACTT 

251 CCGCCGCCAG CGCATCGGTC ATTGTCGGAC TCGAGCCACT GCTGATGGTG 

301 TTTGTCGGAC ACTTTTTCTT CAACGACAAA GCGCGTGCCT ACCACTGGAT 

351 ATGCGGCGCG GCGGCATTTG CCGGTGTCGC GCTGCTGATG GCGGGCGGTG 

401 CGGAAGAGGG CGGCGAAGTC GGCTGGTTCG GCTGCCTGCT GGTGTTGTTG 

451 GCGGGCGCGG GCTTTTGTGC CGCTATGCGT CCGACGCAAA GGCTGATTGC 

501 ACGCATCGGC GCACCGGCAT TCACATCTGT TTCCATTGCC GCCGCATCGT 

551 TGATGTGCCT GCCGTTTTCG CTTGCTTTGG CGCAAAGTTA TACCGTGGAC 

601 TGGAGCGTCG GAATGGTATT GTCGCTGCTG TATTTGGGCG TGGGGTGCAG 

651 CTGGTACGCC TATTGGCTGT GGAACAAGGG GATGAGCCGT GTTCCTGCCA 

701 ACGTTTCGGG ACTGTTGATT TCGCTCGAAC CCGTCGTCGG CGTGCTGCTG 

751 GCGGTTTTGA TTTTGGGCGA ACACCTGTCG CCCGTGTCCG TCTTGGGCGT 

801 GTTTGTCGTC ATCGCCGCCA CCTTGGTTGC CGGCCGGCTG TCGCATCAAA 

851 AATAA 

This encodes a protein having amino acid sequence <SEQ ID 246>: 



1 MFYQILALII WSSSFIA AKY VYGGID PALM VGVRLLIAAL PAL PACRRHV 

51 GKIPREEWKP L LIVSFVNYV LTLLLQFV GL KYTS AASASV IVGLEPLLMV 

101 FVGHFFFNDK ARAYHW ICGA AAFAGVALLM AGG AEEGGEV GW FGCLLVLL 

151 AGAGFCAAM R PTQRLIARIG APAFTS VSIA AASLMCLPFS LAL AQSYTVD 

201 WSVGMVLSLL YLGVGCSWYA YWLWNKGMSR VPANVSG LLI SLEPWGVLL 

251 AVLILGEHLS PVSVLGVFW IAATLVAGRL SHQK* 



ORF62a and ORF62-1 show 98.9% identity in 284 aa overlap: 



orf 62a . pep M FYQ I LAL 1 1 W S S S FI AAKYVYGG I DP ALMVG VRLL I AAL PAL PACRRH VGK I PREE WK P 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I 
orf 62-1 MFYQI LALI IWS S S FI AAKYVYGG I DPALMVGVRLLIAALPALPACRRHVGKIPREEWKP 60 

orf 62a . pep LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 62-1 LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 120 

orf 62a . pep AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 180 

I I I I I I I I I I I I ! I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 
or f 62 - 1 AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 180 

orf 62a . pep AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGVGCSWYAYWLWNKGMSRVPANVSGLLI 240 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I : I I I I I I I I I I I I I I I I I I I I I I I 
orf 62-1 AASLMCLPFSLALAQSYTVDWSVGbfVLSLLYLGLGCGWYAYWLWNKGMSRVPANVSGLLI 240 

orf 62a. pep SLEPWGVLLAVLILGEHLSPVSVLGVFWIAATLVAGRLSHQKX 285 

II I I I I I I I I I I i I II I I I I I I I : I I I I I I I I I I I I 1 I I I I I I I I 

orf 62-1 SLEPWGVLLAVLILGEHLSPVSALGVFWIAATLVAGRLSHQKX 285 

Homology with a predicted ORF from N. gonorrhoeae 

ORF62 shows 99.5% identity over a 216aa overlap with a predicted ORF (ORF62.ng) from N. 



gonorrhoeae: 
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orf 62 . pep MFYQILALI I WS S S FI AAKYVYGGI DPALMVGVRLLIAALPALPACRRHVGKI PREEWKP 60 

I I I II I I I I I I : i I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf62ng M FYQ I LAL 1 1 WGSSFI AAKYVYGGI DPALMVGVRLLIAALPALPACRRHVGKI PREEWKP 60 

orf 62 .pep LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I 

orf62ng LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 120 

orf 62 . pep AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 180 

I | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I t I I I I I I I I I I I I I I I 
orf62ng AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 180 

orf 62. pep AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGC 216 

II I I I I I I I i I I I I I I II I I I I I I I I I I I I I I I I I I 

orf62ng AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANASGLLI 240 



The complete length ORF62ng nucleotide sequence <SEQ ID 247> is: 

1 ATGTTTTACC AAATCCTTGC CCTGATTATC TGGGGCAGCT CGTTTATTGC 

51 CGCCAAATAT GTCTATGGCG GCATCGATCC CGCATTGATG GTCGGCGTGC 

101 GCCTGCTGAT TGCCGCGCTG CCTGCACTGC CCGCCTGCCG CCGTCATGTC 

151 GGCAAGATTC CGCGTGAGGA ATGGAAGCCG TTGCTGATTG TGTCGTTCGT 

201 CAACTATGTG CTGACCCTGC TGCTTCAGTT TGTCGGGTTG AAATACACTT 

251 CCGCCGCCAG CGCATCGGTC ATTGTCGGAC TCGAGCCGCT GCTGATGGTG 

301 TTTGTCGGAC ACTTTTTCTT CAACGACAAA GCGCGTGCCT ACCACTGGAT 

351 ATGCGGCGCG GCGGCATTTG CCGGTGTCGC GCTGCTGATG GCGGGCGGTG 

401 CGGAAGAGGG CGGCGAAGTC GGCTGGTTCG GCTGCCTGCT GGTGTTGTTG 

451 GCGGGCGCGG GCTTTTGTGC CGCTATGCGT CCGACGCAAA GGCTGATTGC 

501 CCGCATCGGC GCACCGGCAT TCACATCTGT TTCCATTGCC GCCGCATCGT 

551 TGATGTGCCT GCCGTTTTCG CTTGCTTTGG CGCAAAGTTA TACCGTGGAC 

601 TGGAGCGTCG GGATGGTATT GTCGCTGTTG TATTTGGGTT TGGGGTGCGG 

651 CTGGTACGCC TATTGGCTGT GGAACAAGGG GATGAGCCGT GTTCCTGCCA 

701 ACGCGTCGGG ACTGTTGATT TCGCTCGAAC CCGTCGTCGG CGTGCTGTTG 

751 GCGGTTTTGA TTTTGGGCGA ACATTTATCG CCCGTGTCCG CCTTGGGCGT 

801 GTTTGTCGTC ATCGCCGCCA CTTTCGCCGC CGGCCGGCTG TCGCGCAGGG 

851 ACGCGCAAAA CGGCAATGCC GTCTGA 

This encodes a protein having amino acid sequence <SEQ ID 248>: 

1 MFYQILALI I WGSSFIA AKY VYGGID PALM VGVRLLIAAL PAL PACRRHV 

51 GKI PREEWKP L LIVSFVNYV LTLLLQFV GL KYTSA ASASV IVGLEPLLMV 

101 FVGHFFFNDK ARAYHW ICGA AAFAGVALLM AGG AEEGGEV GWFGCLLVLL 

151 AGAGFCAAM R PTQRLIARIG APAFTS VSIA AASLMCLPFS LAL AQSYTVD 

201 WSVGMVLSLL YLGLGCGWYA YWLWNKGMSR VPANASG LLI SLEPWGVLL 

251 AVLILGEHLS PVSALGVFW IAATFAAGRL SRRDAQNGNA V* 



ORF62ng and ORF62-1 show 97.9% identity in 283 aa overlap: 

10 20 30 40 50 60 

orf 62ng . pep MFYQILALIIWGSSFIAAKYVYGGIDPALMVGVRLLIAALPALPACRRHVGKIPREEWKP 
I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
orf 62-1 M FYQ I LAL 1 1 W S S S FI AAK YV YGG I D P ALMVGVRL L I AALP AL P ACRRHVGK I PREEWK P 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 62ng . pep LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 

I I | | II I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 62-1 LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 62ng . pep AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 
M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I II I I I I I I I i I I I I 
orf 62-1 AAFAGVALLMAGGAEEGGEVGW FGCLLVLLAG AG FC AAMR PTQRL I ARI GAP AFTSVS I A 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 62ng . pep AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANASGLLI 
| | M I I II I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I II I I I II I : I I I I I 
orf 62-1 AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANVSGLLI 

190 200 210 220 230 240 
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250 260 270 280 290 

orf 62ng . pep SLEPVVGVLLAVLILGEHLSPVSALGVFWIAATFAAGRLSRRDAQNGNAVX 

I 1 I I I I I I I I I I I I I I I I I I I I i I I I I I II I I I I :: I I I I I : : 
orf 62-1 SLEPWGVLLAVLILGEHLSPVSALGVFWIAATLVAGRLSHQKX 

250 260 270 280 

Furthermore, ORF62ng shows significant homology to a hypothetical Kinfluenzae protein: 

sp|Q57147|Y976_HAEIN HYPOTHETICAL PROTEIN HI0976 >gi 1 1074589 | pir | | B64163 
hypothetical protein HI0976 - Haemophilus influenzae (strain Rd KW20) 
>gi 1 1574004 (U32778) hypothetical [Haemophilus influenzae] Length » 128 

Score = 106 bits (262), Expect « 2e-22 

Identities = 56/114 (49%), Positives - 68/114 (59%) 

Query : 1 MFYQILALIIWGSSFIAAKYVYGGIDPALMVGVRXXXXXXXXXXXCRRHVGKIPREEWKP 60 

M YQILAL+IW SS I K Y +DP L+V VR R KI + K 

Sbjct: 1 MLYQILALLIWSSSLIVGKLTYSMMDPVLWQVRLIIAMIIVMPLFLRRWKKIDKPMRKQ 60 

Query: 61 LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAY 114 

L ++F NY LLQF+GLKYTSA+SA ++GLEPLL+VFVGHFFF K + 
Sbjct: 61 LWWLAFFNYTAVFLLQFIGLKYTSASSAVTMIGLEPLLWFVGHFFFKTKQNGF 114 



Based on this analysis, including the homology with the transmembrane protein of Kinfluenzae 
and the putative leader sequecne and several transmembrane domains in the gonococcal protein, 
it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 30 

The following partial DNA sequence was identified in N. meningitidis <SEQ ED 249>: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 



ATGCGCCGTT 
SGGACTGACG 
GGTGGATTGT 
TTGGCACGTT 
CGGTTCGCtA 
GCCGkACTGC 
CGGCACGATT 
GCAGCCTCAA 
CTCGGCAACG 
GCCCGGGGAT 
CCCAGCTTGC 
AACCCGCACA 
AATCCaACGG 
TGTaCGCGCA 
GCCTTGTTTT 
yTTAATCGAA 
AAGGTTTGCA 
TCGATTTTTC 
CGAACCCGTC 
ATTTCAGCCA 
ACCArGTTGT 
AGACGAGCGC 
GCGTGTTGGA 
TGTCTGAAAA 



TTCTACCGAT 
GCGGCAACCG 
TGCGTTCAGC 
ATGTCATATT 
srTyGCCAAA 
CCGGCGTGTT 
AATTCGTGGT 
TTTGAGCAAG 
CCGTCCCCGT 
ATGGGCAGGG 
CCTGTACAAy 
AGCTCGATCA 
GCGGGTTCGG 
GGGCTGGCTG 
TCCGTCAGCC 
AAGGCAAGGG 
GACCTTTTTC 
TTGCACTGGT 
CTATCGCTTG 
GACGCGCCCC 
TCAACCACAT 
AACCGCCGGC 
GGGGCTGACC 
CCTTCAACAA 



CGCAGCCATA 
GCAGCACCAG 
GCAATGCTGC 
GCTGTTGAAA 
gsGCCTgkks 
TCTGTTCGGC 
TCGGCAACGA 
TCCGCATTGA 
GCAGATAGAC 
TGCTGGAACA 
ksCGCAAGCG 
GCCGTTTCCA 
TCAGGGATTT 
TCGGCGGGTA 
GGTTCCCAAA 
CGAAATATGC 
CTGGCAACCC 
CATGGCACTG 
CCGAGGGGGC 
GTGTTGCGCA 
GACCGAGCAG 
GCGAGGAAGC 
ACGGGCGTGG 
AGCGGCGGGT 



TGCGCmGwms 
TTCGCTGGCG 
TGCTGGTGTT 
GACAGGCGCG 
TGGG.ATGTT 
TTTCCCGCAC 
TACCCACGAG 
ATTTGGCGGC 
CTCATCGGCG 
TTACGCCGGC 
GCAAAATCGA 
GGTAAGGCGC 
GGAAAGCATA 
CGCACwACGG 
GGCGTGGCAG 
TGAGTTGAGT 
TGCTGATTGC 
TATTTCGCCC 
GAAGGCGGTG 
ACGACGAGTT 
CTTTCCATCG 
CGCCAGGCAT 
TGGTGTTTGA 
ACC 



TCCTGkkGTA 
GATTATTTCT 
GTCCGCCGTT 
ACGGCGTATT 
TACGCTGGTT 
AGTTCATCAA 
GCGCTTGAAC 
AGACAACGCC 
CGGCTTCCCT 
AGCGGTTTTG 
AAAAAGCATC 
GTTGGGAaAa 
GGCGGCGTAT 
GCGCGATTAC 
AGGATGCCGT 
TACAGCAAAA 
CTCGCTGCTG 
GCCGTTTCGT 
GCGCAAGGCG 
CGGACGCTTG 
CCAAAGATGC 
TATCTTGAAT 
CGAACAAGGC 



This corresponds to the amino acid sequence <SEQ ID 250; ORF64>: 



i 

51 
101 
151 
201 
251 



MRRFLPIAAI 
LARYVILLLK 
GTINSWFGND 
PGDMGRVLEH 
IQRAGSVRDL 
LIEKARAKYA 



CAXXLXXGLT 
DRRDGVFGSX 
THEALERSLN 
YAGSGFAQLA 
ESIGGVLYAQ 
ELSYSKKGLQ 



AATGSTSSLA 
XAKXPXXXMF 
LSKSALNLAA 
LYNXASGKIE 
GWLSAGTHXG 
TFFLATLLIA 



DYFWWIVAFS 
TLVAXLPGVF 
DNALGNAVPV 
KSINPHKLDQ 
RDYALFFRQP 
SLLSIFLALV 



AMLLLVLSAV 
LFGFPAQFIN 
QIDLIGAASL 
PFPGKARWEK 
VPKGVAEDAV 
MALYFARRFV 
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301 EPVLSLAEGA KAVAQGDFSQ TRPVLRNDEF GRLTXLFNHM TEQLSIAKDA 

351 DERNRRREEA ARHYLECVLE GLTTGVWFD EQGCLKTFNK AAGT . . 

Further work revealed the complete nucleotide sequence <SEQ ID 25 1>: 

1 ATGCGCCGTT TTCTACCGAT CGCAGCCATA TGCGCCGTCG TCCTGTTGTA 

51 CGGACTGACG GCGGCAACCG GCAGCACCAG TTCGCTGGCG GATTATTTCT 

101 GGTGGATTGT TGCGTTCAGC GCAATGCTGC TGCTGGTGTT GTCCGCCGTT 

151 TTGGCACGTT ATGTCATATT GCTGTTGAAA GACAGGCGCG ACGGCGTATT 

201 CGGTTCGCAG ATTGCCAAAC GCCTTTCTGG GATGTTTACG CTGGTTGCCG 

251 TACTGCCCGG CGTGTTTCTG TTCGGCGTTT CCGCACAGTT CATCAACGGC 

301 ACGATTAATT CGTGGTTCGG CAACGATACC CACGAGGCGC TTGAACGCAG 

351 CCTCAATTTG AGCAAGTCCG CATTGAATTT GGCGGCAGAC AACGCCCTCG 

401 GCAACGCCGT CCCCGTGCAG ATAGACCTCA TCGGCGCGGC TTCCCTGCCC 

451 GGGGATATGG GCAGGGTGCT GGAACATTAC GCCGGCAGCG GTTTTGCCCA 

501 GCTTGCCCTG TACAATGCCG CAAGCGGCAA AATCGAAAAA AGCATCAACC 

551 CGCACAAGCT CGATCAGCCG TTTCCAGGTA AGGCGCGTTG GGAAAAAATC 

601 CAACGGGCGG GTTCGGTCAG GGATTTGGAA AGCATAGGCG GCGTATTGTA 

651 CGCGCAGGGC TGGCTGTCGG CGGGTACGCA CAACGGGCGC GATTACGCCT 

701 TGTTTTTCCG TCAGCCGGTT CCCAAAGGCG TGGCAGAGGA TGCCGTCTTA 

751 ATCGAAAAGG CAAGGGCGAA ATATGCTGAG TTGAGTTACA GCAAAAAAGG 

801 TTTGCAGACC TTTTTCCTGG CAACCCTGCT GATTGCCTCG CTGCTGTCGA 

851 TTTTTCTTGC ACTGGTCATG GCACTGTATT TCGCCCGCCG TTTCGTCGAA 

901 CCCGTCCTAT CGCTTGCCGA GGGGGCGAAG GCGGTGGCGC AAGGCGATTT 

951 CAGCCAGACG CGCCCCGTGT TGCGCAACGA CGAGTTCGGA CGCTTGACCA 

1001 AGTTGTTCAA CCACATGACC GAGCAGCTTT CCATCGCCAA AGAAGCAGAC 

1051 GAGCGCAACC GCCGGCGCGA GGAAGCCGCC AGGCATTATC TTGAATGCGT 

1101 GTTGGAGGGG CTGACCACGG GCGTGGTGGT GTTTGACGAA CAAGGCTGTC 

1151 TGAAAACCTT CAACAAAGCG GCGGAACAGA TTTTGGGGAT GCCGCTTACC 

1201 CCCCTGTGGG GCAGCAGCCG GCACGGTTGG CACGGCGTTT CGGCGCAGCA 

1251 GTCCCTGCTT GCCGAAGTGT TTGCCGCCAT CGGCGCGGCG GCAGGTACGG 

1301 ACAAACCGGT CCATGTGAAA TATGCCGCGC CGGACGATGC CAAAATCCTG 

1351 CTGGGCAAGG CAACCGTCCT GCCCGAAGAC AACGGCAACG GCGTGGTAAT 

1401 GGTGATTGAC GACATCACCG TTTTGATACA CGCGCAAAAA GAAGCCGCGT 

1451 GGGGCGAAGT GGCGAAGCGG CTGGCACACG AAATCCGCAA TCCGCTCACG 

1501 CCCATCCAGC TTTCCGCCGA ACGGCTGGCG TGGAAATTGG GCGGGAAGCT 

1551 GGATGAGCAG GATGCGCAAA TCCTGACGCG TTCGACCGAC ACCATCGTCA 

1601 AACAGGTGGG GGCATTGAAG GAAATGGTCG AAGCATTCCG CAATTATGCG 

1651 CGTTCCCCTT CGCTCAAATT GGAAAATCAG GATTTGAACG CCTTAATCGG 

1701 CGATGTGTTG GCATTGTATG AAGCCGGTCC GTGCCGGTTT GCGGCGGAGC 

1751 TTGCCGGCGA ACCGCTGACG GTGGCGGCGG ATACGACCGC CATGCGGCAG 

1801 GTGCTGCACA ATATTTTCAA AAATGCCGCC GAAGCGGCGG AAGAAGCCGA 

1851 TGTGCCCGAA GTCAGGGTAA AATCGGAAAC AGGGCAGGAC GGTCGGATTG 

1901 TCCTGACGGT TTGCGACAAC GGCAAAGGGT TCGGCAGGGA AATGCTGCAC 

1951 AACGCCTTCG AGCCGTATGT AACGGACAAA CCGGCGGGAA CGGGATTGGG 

2001 TCTGCCTGTG GTGAAAAAAA TCATTGAAGA ACACGGCGGC CGCATCAGCC 

2051 TGAGCAATCA GGATGCGGGT GGCGCGTGTG TCAGAATCAT CTTGCCAAAA 

2101 ACGGTAAAAA CTTATGCGTA G 

This corresponds to the amino acid sequence <SEQ ID 252; ORF64-l>: 

1 MRRFLPIAAI CAWLLYGLT AATGSTSSLA DYFWWIVAFS AM LLLVLSAV 

51 LARYVILLL K DRRDGVFGSQ IAKRLS GMFT LVAVLPGVFL FGV SAQFING 

101 TINSWFGNDT HEALERS LNL SKSALNLAAD NALGNAVPVQ IDLIGAASLP 

151 GDMGRVLEHY AGSGFAQLAL YNAASGKIEK SINPHKLDQP FPGKARWEKI 

201 QRAGSVRDLE SIGGVLYAQG WLSAGTHNGR DYALFFRQPV PKGVAEDAVL 

251 IEKARAKYAE LSYSKKGLQT FFLAT LLIAS LLSIFLALVM AL YFARRFVE 

301 PVLSLAEGAK AVAQGDFSQT RPVLRNDEFG RLTKLFNHMT EQLSIAKEAD 

351 ERNRRREEAA RHYLECVLEG LTTGVWFDE QGCLKTFNKA AEQILGMPLT 

401 PLWGSSRHGW HGVSAQQSLL AEVFAAIGAA AGTDKPVHVK YAAPDDAKIL 

451 LGKATVLPED NGNGWMVID DITVLIHAQK EAAWGEVAKR LAHEIRNPLT 

501 PIQLSAERLA WKLGGKLDEQ DAQILTRSTD TIVKQVAALK EMVEAFRNYA 

551 RSPSLKLENQ DLNALIGDVL ALYEAGPCRF AAELAGEPLT VAADTTAMRQ 

601 VLHNIFKNAA EAAEEADVPE VRVKSETGQD GRIVLTVCDN GKGFGREMLH 

651 NAFEPYVTDK PAGTGLGLPV VKKIIEEHGG RISLSNQDAG GACVRIILPK 

701 TVKTYA* 

Computer analysis of this amino acid sequence gave the following results: 



WO 99/24578 



-182- 



PCT/IB98/01665 



Homology with a predicted ORF from meningitidis (strain A) 

ORF64 shows 92.6% identity over a 392aa overlap with an ORF (ORF64a) from strain A of N. 



meningitidis: 



orf64.pep 
orf 64a 

orf 64 .pep 
orf 64a 



10 20 30 40 50 60 

MRRFLPIAAICAXXLXXGLTAATGSTSSLA DY FWW IVAFSAM LLLVLSAVLARYVILLL K 

I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
MRRFLPIAAICAWLLYGLTAATGSTSSLA DY FWW IVAFSAM LLLVLSAVLARYVILLL K 

40 



10 



20 



30 



50 



60 



70 80 90 100 110 120 

DRRDGVFGSXXAKXPXX XMFTLVAXLPGVFLFG FPAQFINGTINSWFGNDTHEALERSLN 

I I I I I I I I I 11 I I I I I I I I I I I II I I I I I I I I I I I I I I I I ! I i I I I I 1 I I 

DRRPGVFGSQIAKR-LS GMFTLVAVLPGVFLFGV SAQFINGTINSWFGNDTHEALERSLN 
70 80 90 100 110 



130 140 150 160 170 180 

orf 64 . pep LSKSALNLAADNALGNAVPVQIDLIGAASLPGDMGRVLEHYAGSGFAQLALYNXASGKIE 
I I I I I 1 I I I I II I I I I I : I I I I I I I I I I II I I I I I I I I II I I I I I II I I II I II I I I 
orf 64a LSKSALNLAADNALGNAIPVQIDXIGAASLPXDMGRVLEHYAGSGFAQLALYNAASGKIE 
120 130 140 150 160 170 



190 200 210 220 230 240 

orf 64 . pep KSINPHKLDQPFPGKARWEKIQRAGSVRDLESIGGVLYAQGWLSAGTHXGRDYALFFRQP 
I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I 
orf 64a KSINPHKLDQPFPGKARWEKIQQAGSVRDXESIGGVLYAXGWLSAXTHNGRDYALFFRQP 
180 190 200 210 220 230 

250 260 270 280 290 300 

orf 64 . pep VPKGVAEDAVLIEKARAKYAELSYSKKGLQTFFLAT LLIASLLSIFLALVMALY FARRFV 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I M I I I I I I I I I 1 I I I I I 

orf 64 a VPKGVAEDAVLIEKARAXXXXLSYSKKGLQTFFLAT LLIASLLSIFLALVMALY FARRFV 
240 250 260 270 280 290 



310 320 330 340 350 360 

orf 64 . pep EPVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTXLFNHMTEQLSIAKDADERNRRREEA 
I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I 
orf 64a EPVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEA 
300 310 320 330 340 350 



370 380 390 

orf 64 . pep ARHYLECVLEGLTTGVWFDEQGCLKTFNKAAGT 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 64a ARHYLECVLEGLTTGVWFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSL 
360 370 380 390 400 410 



orf 64a LAEVFAAIGAAAGTDKPVHVKYAAPDDAKILLGKATVLPEDNXNGWMVIDDITVLIHAQ 
420 430 440 450 460 470 

The complete length ORF64a nucleotide sequence <SEQ ID 253> is: 



1 ATGCGCCGTT TTCTACCGAT CGCAGCCATA TGCGCCGTCG TCCTGTTGTA 

51 CGGACTGACG GCGGCAACCG GCAGCACCAG TTCGCTGGCG GATTATTTCT 

101 GGTGGATTGT TGCGTTCAGC GCAATGCTGC TGCTGGTGTT GTCCGCCGTT 

151 TTGGCACGTT ATGTCATATT GCTGTTGAAA GACAGGCGCG ACGGCGTATT 

201 CGGTTCGCAG ATTGCCAAAC GCCTTTCCGG GATGTTTACG CTGGTTGCCG 

251 TACTGCCCGG CGTGTTTCTG TTCGGCGTTT CCGCACAGTT TATCAACGGC 

301 ACGATTAATT CGTGGTTCGG CAACGATACC CACGAGGCGC TTGAACGCAG 

351 CCTCAATTTG AGCAAGTCCG CATTGAATCT GGCGGCAGAC AACGCCCTTG 

401 GCAACGCCAT CCCCGTGCAG ATAGACNTCA TCGGCGCGGC TTCCCTGCCC 

451 NGGGATATGG GCAGGGTGCT GGAACATTAC GCCGGCAGCG GTTTTGCCCA 

501 GCTTGCCCTG TACAATGCCG CAAGCGGCAA AATCGAAAAA AGCATCAACC 

551 CGCACAAGCT CGATCAGCCG TTTCCAGGTA AGGCGCGTTG GGAAAAAATC 

601 CAACAGGCGG GTTCGGTCAG GGATNNGGAA AGCATAGGCG GCGTATTGTA 

651 CGCGCANGGC TGGCTGTCGG CAGNNACGCA CAACGGGCGC GATTACGCCT 

701 TGTTTTTCCG TCAGCCGGTT CCCAAAGGCG TGGCAGAGGA TGCCGTCTTA 

751 ATCGAAAAGG CAAGGGCGNA ANANNNTNAG TTGAGTTACA GCAAAAAAGG 

801 TTTGCAGACC TTTTTCCTNG CAACCCTGCT GATTGCCTCN CTGCTGTCGA 

851 TTTTTCTTGC ACTGGTCATG GCACTGTATT TCGCCCGCCG TTTCGTCGAA 
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901 CCCGTCCTAT CGCTTGCCGA GGGGGCGAAG GCGGTGGCGC AAGGCGATTT 

951 CAGCCAGACG CGCCCCGTGT TGCGCAACGA CGAGTTCGGA CGCTTGACCA 

1001 AGTTGTTCAA CCACATGACC GAGCAGCTTT CCATCGCCAA AGAAGCAGAC 

1051 GAGCGCAACC GCCGGCGCGA GGAAGCCGCC AGACATTATC TCGAATGCGT 

1101 GTTGGAGGGG CTGACCACGG GCGTGGTGGT GTTTGACGAA CAAGGCTGTC 

1151 TGAAAACCTT CAACAAAGCG GCGGAACAGA TTTTGGGGAT GCCGCTTACC 

1201 CCCCTGTGGG GCAGCAGCCG GCACGGTTGG CACGGCGTTT CGGCGCAGCA 

1251 GTCCCTGCTT GCCGAAGTGT TTGCCGCCAT CGGCGCGGCG GCAGGTACGG 

1301 ACAAACCGGT CCATGTGAAA TATGCCGCGC CGGACGATGC CAAAATCCTG 

1351 CTGGGCAAGG CAACCGTCCT GCCCGAAGAC AACNGCAACG GCGTGGTAAT 

1401 GGTGATTGAC GACATCACCG TTTTGATACA CGCGCAAAAA GAAGCCGCGT 

1451 GGGGCGAAGT GGCAAAACGG CTGGCACACG AAATCCGCAA TCCGCTCACG 

1501 CCCATCCAGC TTTCTGCCGA ACGGCTGGCG TGGAAATTGG GCGGGAAGCT 

1551 GGACGAGCAN GACGCGCAAA TCCTGACACG TTCGACCGAC ACCATCATCA 

1601 AACAAGTGGC GGCATTAAAA GAAATGGTCG AGGCATTCCG CAATTACNCG 

1651 CGTTCCCCTT CGNCTCAATT GGAAAATCAG GATTTGAACG CCTTAATCGG 

1701 CGATGTGTTG GCATTGTACG AAGCTGGTCC GTGCCGGTTT GCGGCGGAAC 

1751 TTGCCGGCGA ACCGCTGATG ATGGCGGCGG ATACGACCGC CATGCGGCAG 

1801 GTGCTGCACA ATATTTTCAA AAATGCCGCC GAAGCGGCGG AAGAAGCCGA 

1851 TGTGCCCGAA GTCAGGGTAA AATCGGAAGC GGGGCAGGAC GGACGGATTG 

1901 TCCTGACAGT TTGCGACAAC GGCAAGGGGT TCGGCAGGGA AATGCTGCAC 

1951 AATGCCTTCG AGCCGTATGT AACGGACAAA CCGGCTGGAA CGGGATTGNG 

2001 ACTGCCCGTG GTGAAAAAAA TCATTGAAGA ACACGGCGGC CNCATCAGCC 

2051 TGAGCAATCA GGATGCGGGC GGCGCGTNTG TCAGAATCAT CTTGCCAAAA 

2101 ACGGTAGAAA CTTATGCGTA G 

This encodes a protein having amino acid sequence <SEQ ID 254>: 

1 MRRFLPIAAI CAWLLYGLT AATGSTSSLA DYFWWIVAFS AM LLLVLSAV 

51 LAJRYVILLL K DRRDGVFGSQ IAKRLS GMFT LVAVLPGVFL FGV SAQFING 

101 TINSWFGNDT HEALERS LNL SKSALNLAAD NALGNAIPVQ IDXIGAASLP 

151 XDMGRVLEHY AGSGFAQLAL YNAASGKIEK SINPHKLDQP FPGKARWEKI 

201 QQAGSVRDXE SIGGVLYAXG WLSAXTHNGR DYALFFRQPV PKGVAEDAVL 

251 IEKARAXXXX LSYSKKGLQT F FLAT LL IAS LLSIFLALVM ALY FARRFVE 

301 PVLSLAEGAK AVAQGDFSQT RPVLRNDEFG RLTKLFNHMT EQLSIAKEAD 

351 ERNRRREEAA RHYLECVLEG LTTGVWFDE QGCLKTFNKA AEQILGMPLT 

401 PLWGSSRHGW HGVSAQQSLL AEVFAAIGAA AGTDKPVHVK YAAPDDAKIL 

451 LGKATVLPED NXNGWMVID DITVLIHAQK EAAWGEVAKR LAHEIRNPLT 

501 PIQLSAERLA WKLGGKLDEX DAQILTRSTD TIIKQVAALK EMVEAFRNYX 

551 RSPSXQLENQ DLNALIGDVL ALYEAGPCRF AAELAGEPLM MAADTTAMRQ 

601 VLHNIFKNAA EAAEEADVPE VRVKSEAGQD GRIVLTVCDN GKGFGREMLH 

651 NAFEPYVTDK PAGTGLXLPV VKKIIEEHGG XISLSNQDAG GAXVRIILPK 

701 TVETYA* 

ORF64a and ORF64-1 show 96.6% identity in 706 aa overlap: 

10 20 30 40 50 60 

MRRFLPIAAICAWLLYGLTAATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 

I | | | | M I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
MRRFLPIAAI CAWLLYGLT AATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 
10 20 30 40 50 60 

70 80 90 100 110 120 

DRRDGVFGSQIAKRLSGMFTLVAVLPGVFLFGVSAQFINGTINSWFGNDTHEALERSLNL 
| | | | | | | | | | | | | I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
DRRDGVFGSQIAKRLSGMFT LVAVLPGVFL FGVSAQFINGTINSWFGNDTHEALERSLNL 
70 80 90 100 110 120 

130 140 150 160 170 180 

SKSALNLAADNALGNAIPVQIDXIGAASLPXDMGRVLEHYAGSGFAQLALYNAASGKIEK 
| | | | | | | I I 1 I I It I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 
S KSALNLAADNALGNAVPVQ I DL I GAASLPGDMGRVLEHY AGSGFAQLAL YNAASGKIEK 
130 140 150 160 170 180 

190 200 210 220 230 240 

SINPHKLDQPFPGKARWEKIQQAGSVRDXESIGGVLYAXGWLSAXTHNGRDYALFFRQPV 
M | | | | | | | | M I M It I I I I : I I II I I MINIMI Mill I II M I I I I I I I I I I 
SINPHKLDQPFPGKARWEKIQRAGSVRDLESIGGVLYAQGWLSAGTHNGRDYALFFRQPV 
190 200 210 220 230 240 



orf 64a. pep 
orf64-l 

orf 64a. pep 
orf64-l 

orf 64a. pep 
orf64-l 

orf 64a. pep 
orf64-l 



250 260 270 280 290 300 
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10 



15 



20 



25 



30 



35 



40 



45 



orf 64a . pep PKGVAEDAVLIEKARAXXXXLSYSKKGLQTFFLATLLIASLLSIFLALVMALYFARRFVE 
I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 

orf 64-1 PKGVAEDAVLIEKARAKYAELSYSKKGLQTFFLATLLIASLLSIFLALVMALYFARRFVE 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 64a . pep PVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEAA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 64-1 PVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEAA 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 64 a . pep RHYLECVLEGLTTGVWFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSLL 
I I I I I I I I I I I I I I I I I I I I I I I I 1 I I II I I I I I I I I I II I I I I I I I I I I II I I II II I I 
orf 64-1 RHYLECVLEGLTTGVWFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSLL 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 64a . pep AEVFAAIGAAAGTDKPVHVKYAAPDDAKILLGKATVLPEDNXNGWMVIDDITVLIHAQK 
I I I I I I 1 I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I II I I I I I I I I 
orf 64-1 AEVFAAIGAAAGTDKPVHVKYAAPDDAKILLGKATVLPEDNGNGWMVIDDITVLIHAQK 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf 64a. pep EAAWGEVAKRLAHEIRNPLTPIQLSAERLAWKLGGKLDEXDAQILTRSTDTIIKQVAALK 
I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I i I I I I I I I I I I I I I It I I I : I I I I I I I 
orf 64-1 EAAWGEVAKRLAHEIRNPLTPIQLSAERLAWKLGGKLDEQDAQILTRSTDTIVKQVAALK 

490 500 510 520 530 540 

550 560 570 580 590 600 

orf 64a . pep EMVEAFRNYXRSPSXQLENQDLNALIGDVLALYEAGPCRFAAELAGEPLMMAADTTAMRQ 
I I II I I I I I I I I I : i I 1 I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I : I I I I I I I I I 
or f 6 4 - 1 EMVEAFRN YARS PS LKLENQDLNALIGDVLALYEAGPCRFAAELAGE PLTVAADTTAMRQ 

550 560 570 580 590 600 

610 620 630 640 650 660 

orf 64a . pep VLHNIFKNAAEAAEEADVPEVRVKSEAGQDGRIVLTVCDNGKGFGREMLHNAFEPYVTDK 
I I I I I I I I I I I I I I II I I I I I I I I II : I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I 
orf 64-1 VLHNIFKNAAEAAEEADVPEVRVKSETGQDGRIVLTVCDNGKGFGREMLHNAFEPYVTDK 

610 620 630 640 650 660 

670 680 690 700 

orf 64a. pep PAGTGLXLPWKKI IEEHGGXISLSNQDAGGAXVRI ILPKTVETYAX 
MINI I I I I I I 1 I I I I I I II MINIMI I I I I I I 1 I I : I II I 
or f 64 - 1 PAGTGLGLPWKKI IEEHGGRI SLSNQDAGGACVRI ILPKTVKTYAX 

670 680 690 700 



50 



Homology with a predicted ORF from Kzonorrhoeae 

ORF64 shows 86.6% identity over a 387aa overlap with a predicted ORF (ORF64.ng) from N. 
gonorrhoeae: 



55 



60 



65 



orf 64 .pep 
orf 64ng 
orf 64 .pep 
orf 64ng 
orf 64 .pep 
orf 64ng 
orf 64 .pep 
orf 64ng 



MRRFLPIAAICAXXLXXGLTAATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 

I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I : I I II I I M I I I I I I I I I I I I I I 
MRRFLPIAAICAWLLYGLTAATGSTSSLADYFWWIVSFSAMLLLVLSAVLARYVILLLK 



60 



60 



120 



DRRDGVFGSXXAKXPXXXMFTLVAXLPGVFLFGFPAQFINGTINSWFGNDTHEALERSLN 
111:11111 II IIIIM ilhllll: I I II I II I I I I I M 1 I I I I I I I I M 

DRRNGVFGSQIAKR-LSGMFTLVAVLPGLFLFGISAQFINGTINSWFGNDTHEALERSLN 119 

LSKSALNLAADNALGNAVPVQIDLIGAASLPGDMGRVLEHYAGSGFAQLALYNXASGKIE 180 

I I I I I I : I II I I I : : I I I I I I I I I I I : I I I I : I I I I I I I II II I I I I I I I I I I I I I I 
LSKSALDLAADNAVSNAVPVQIDLIGTASLSGNMGSVLEHYAGSGFAQLALYNAASGKIE 179 

KSINPHKLDQPFPGKARWEKIQRAGSVRDLESIGGVLYAQGWLSAGTHXGRDYALFFRQP 240 
I | | | | |:: I I 1:1 I : I I : I I : : I I I I : I I 1 I I I I I I I I I I I I I I I I I I I I II I I I I I 
KSINPHQFDQPLPDKEHWEQIQQTGSVRSLESIGGVLYAQGWLSAGTHNGRDYALFFRQP 239 
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orf 64 .pep 
orf 64ng 
orf 64 .pep 
orf 64ng 
orf 64 .pep 
orf 64ng 



VPKGVAEDAVLIEKARAKYAELSYSKKGLQTFFLATLLIASLLSIFLALVMALYFARRFV 300 

|:MI I I I! II I IMMI I I! II II II I 1:11 I I I i I Ml II II I I MM Nil I 
I PEN VAQD AVL I EKARAK YAEL S YSKKGLQT FFLVTLL IASLLSI FLALVMAL YFARRFV 299 

EPVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTXLFNHMTEQLSIAKDADERNRRREEA 360 

M : M I M M M M M M M M I M M M M I M I I I I I I I I I I I I I: I I I I I I 1 1 I I I 
EPILSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEA 359 

ARHYLECVLEGLTTGVWFDEQGCLKTFNKAAGT 394 
M M M M I : I M M II I : I : I 

ARHYLECVLDGLTTGVWSYPLSCCRTAVFSTCHSSPLSYF 4 00 



An ORF64ng nucleotide sequence <SEQ ID 255> was predicted to encode a protein having amino 
acid sequence <SEQ ID 256>: 



1 MRRFLPIAAI CAWLLYGLT AATGSTSSLA DYFWWIVSFS A MLLLVLSAV 

51 LARYVILLL K DRRNGVFGSQ IAKRLS GMFT LVAVLPGLFL FGI SAQFING 

101 TINSWFGNDT HEALER SLNL SKSALDIiAAD NAVSNAVPVQ IDLIGTASLS 

151 GNMGSVLEHY AGSGFAQLAL YNAASGKIEK SINPHQFDQP LPDKEHWEQI 

201 QQTGSVRSLE SIGGVLYAQG WLSAGTHNGR DYALFFRQPI PENVAQDAVL 

251 IEKARAKYAE LSYSKKGLQT FFLVT LLIAS LLSIFLALVM AL YFARRFVE 

301 PILSLAEGAK AVAQGDFSQT RPVLRNDEFG RLTKLFNHMT EQLSIAKEAD 

351 ERNRRREEAA RHYLECVLDG LTTGWVSYP LSCCRTAVFS TCHSSPLSYF* 

Further work revealed the complete gonococcal DNA sequence <SEQ ID 257>: 

1 ATGCGCCGCT TCCTACCGAT CGCAGCCATA TGCGCCGTCG TCCTGCTGTA 

51 CGGATTGACG GCGGCGACCG GCAGCACCAG TTCGCTGGCG GATTATTTCT 

101 GGTGGATAGT CTCGTTCAGC GCAATGCTGC TGCTGGTGTT GTCCGCCGTT 

151 TTGGCACGTT ATGTCATATT GCTGTTGAAA GACAGGCGCA ACGGCGTGTT 

201 CGGTTCGCAG ATTGCCAAAC GCCTTTCCGG GATGTTCACG CTGGTCGCCG 

251 TACTGCCCGG CTTGTTCCTG TTCGGCATTT CCGCGCAGTT TATCAACGGC 

301 ACGATTAATT CGTGGTTCGG CAACGACACC CACGAAGCCC TCGAACGCAG 

351 CCTTAATTTG AGCAAGTCCG CACTGGATTT GGCGGCAGAC AATGCCGTCA 

4 01 GCAACGCCGT TCCCGTACAG ATAGACCTCA TCGGCACCGC CTCCCTGTCG 

451 GGCAATATGG GCAGTGTGCT GGAACACTAC GCCGGCAGCG GTTTTGCCCA 

501 GCTTGCCCTG TACAATGCCG CAAGCGGGAA AATCGAAAAA AGCATCAATC 

551 CGCACCAATT CGACCAGCCG CTTCCCGACA AAGAACATTG GGAACAGATT 

601 CAGCAGACCG GTTCGGTTCG GAGTTTGGAA AGCATAGGCG GCGTATTGTA 

651 CGCGCAGGGA TGGTTGTCGG CAGGTACGCA CAACGGGCGC GATTACGCGC 

701 TGTTCTTCCG CCAGCCGATT CCCGAAAATG TGGCACAGGA TGCCGTTCTG 

751 ATTGAAAAGG CGCGGGCGAA ATATGCCGAA TTGAGTTACA GCAAAAAAGG 

801 TTTGCAGACC TTTTTTCTGG TAACCCTGCT GATTGCCTCG CTGCTGTCGA 

851 TTTTTCTTGC GCTGGTAATG GCACTGTATT TTGCCCGCCG TTTCGTCGAA 

901 CCCATTCTGT CGCTTGCCGA GGGCG CAAAG GCGGTGGCGC AGGGTGATTT 

951 CAGCCAGACG CGCCCCGTAT TGCGCAACGA CGAGTTCGGA CGTTTGACCA 

1001 AGCTGTTCAA CCATATGACC GAGCAGCTTT CCATCGCCAA AGAAGCAGAC 

1051 GAACGCAACC GCCGGCGCGA GGAAGCCGCC CGTCACTACC TCGAGTGCGT 

1101 GTTGGATGGG TTGACTACCG GTGTGGTGGT GTTTGACGAA AAAGGCCGTT 

1151 TGAAAACCTT CAACAAGGCG GCGGAACAGA TTTTGGGGAT GCCGCTCGCC 

1201 CCCCTGTGGG GCAGCAGCCG GCACGGTTGG CACGGCGTTT CGGCGCAGCA 

1251 GTCCCTGCTT GCCGAAGTGT TtgccgccAT CGGTGCGGCG GCAGGTACGG 

1301 ACAAACCGGT CCAGGTGGAA TATGCCGCGC CGGACGATGC CAAAATCCTG 

1351 CTGGGCAAGG CGACGGTATT GCCCGAAGAC AACGGCAACG GCGTGGTGAT 

1401 GGTGATTGAC GACATCACCG TGCTGATACG CGCGCAAAAA GAAGCCGCGT 

1451 GGGGTGAAGT GGCGAAGCGG CTGGCACACG AAATCCGCAA TCCGCTCACG 

1501 CCCATCCAGC TTTCCGCCGA ACGGCTGGCG TGGAAATTGG GCGGGAAGCT 

1551 GGACGATCAG GACGCGCAAA TCCTGACGCG TtcgACCGAC ACCATCATCA 

1601 AACAGgtggc gGCGTTAAAA GAAATGGTCG AGGCATTCCG CAATTACGCG 

1651 CGCGCCCCTT CGCTCAAACT GGAAAATCAG GATTTGAACG CCTTAATCGG 

1701 CGATGTTTTG GCCCTGTACG AAGCCGGCCC GTGCCGGTTT GAGGCGGAAC 

1751 TTGCCGGCGA ACCGCTGATG ATGGCGGCGG ATACGACCGC CATGCGGCAG 

1801 GTGCTGCACA ATATTTTCAA AAATGCCGCC GAAGCGGCGG AAGAAGCCGA 

1851 TATGCCCGAA GTCAGGGTAA AATCGGAAAC GGGGCAGGAC GGACGGATTG 

1901 TCCTGACGGT TTGCGACAAC GGCAAGGGAT TCGGCAAGGA AATGCTGCAC 

1951 AATGCTTTCG AGCCGTATGT GACGGATAAG CCGGCGGGAA CGGGACTGGG 

2001 TCTGCCTGTA GTGAAAAAAA TCATTGGAGA ACACGGCGGC CGCATCAGCC 

2051 TGAGCAATCA GGATGCGGGT GGGGCGTGTG TCAGAATCAT CTTGCCAAAA 

2101 ACGGTAGAAA CTTATGCGTA G 
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This corresponds to the amino acid sequence <SEQ ID 258; ORF64ng-l>: 



10 



15 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



MRRFLPIAAI CAWLLYGLT AATGSTSSLA 



LARYVILLLK 
TINSWFGNDT 
GNMGSVLEHY 
QQTGSVRSLE 
IEKARAKYAE 
PILSLAEGAK 
ERNRRREEAA 
PLWGSSRHGW 
LGKATVLPED 
PIQLSAERLA 
RAPSLKLENQ 
VLHNIFKNAA 
NAFEPYVTDK 
TVETYA* 



DRRNGVFGSQ 
HEALERS LNL 
AGSGFAQLAL 
SIGGVLYAQG 
LSYSKKGLQT 
AVAQGDFSQT 
RHYLECVLDG 
HGVSAQQSLL 
NGNGWMVID 
WKLGGKLDDQ 
DLNALIGDVL 
EAAEEADMPE 
PAGTGLGLPV 



IAKRLSGMFT 



DYFWWIVSFS 
LVAVLPGLFL 



AM LLLVLSAV 
FGISAQFING 



SKSALDLAAD 
YNAASGKIEK 
WLSAGTHNGR 
FFLVTLLIAS 



NAVSNAVPVQ 
SINPHQFDQP 
DYALFFRQPI 
LLSIFLALVM 



RPVLRNDEFG 
LTTGVWFDE 
AEVFAAIGAA 
DITVLIRAQK 
DAQILTRSTD 
ALYEAGPCRF 
VRVKSETGQD 
VKKIIGEHGG 



RLTKLFNHMT 
KGRLKTFNKA 
AGTDKPVQVE 
EAAWGEVAKR 
TIIKQVAALK 
EAELAGEPLM 
GRIVLTVCDN 
RISLSNQDAG 



IDLIGTASLS 
LPDKEHWEQI 
PENVAQDAVL 
ALYFARRFVE 
EQLSIAKEAD 
AEQILGMPLA 
YAAPDDAKIL 
LAHEIRNPLT 
EMVEAFRNYA 
MAADTTAMRQ 
GKGFGKEMLH 
GACVRIILPK 



ORF64ng-l and ORF64-1 show 93.8% identity in 706 aa overlap: 
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45 
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55 



60 



65 



orf 64ng-l.pep 
orf64-l 



orf 64ng-l.pep 



orf64-l 



orf64ng-l.pep 



orf64-l 



orf 64ng-l .pep 
orf64-l 



orf 64ng-l.pep 
orf64-l 



orf 64ng-l.pep 
orf64-l 



orf 64ng-l .pep 
orf64-l 



orf 64ng-l.pep 
orf64-l 



orf 64ng-l .pep 



10 20 30 40 50 60 

MRRFLPIAAI CAWLLYGLTAATGSTSSLADYFWW I VSFSAMLLLVLSAVLARYVILLLK 

I I I II I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I : I I I I I I I I I I I I I I i II I I I I I 
MRRFLPIAAICAWLLYGLTAATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 
10 20 30 40 50 60 

70 80 90 100 110 120 

DRRNGVFGSQIAKRLSGMFTLVAVLPGLFLFGISAQFINGTINSWFGNDTHEALERSLNL 
I I I : I I I M I I I 1 I M M I I i I I I I II : I I i i : M I I I I If I i I I I I ! I I I I I I I I I I II 
DRRDGVFGSQIAKRLSGMFTLVAVLPGVFLFGVSAQFINGTINSWFGNDTHEALERSLNL 

70 80 90 100 110 120 

130 140 150 160 170 180 

SKSALDLAADNAVSNAVPVQIDLIGTASLSGNMGSVLEHYAGSGFAQLAL YNAASGKIEK 

| | | | | : I I I II I :: I I I I I I I ! I II : II I I: I I I I I I I I II 1 I II I I I I M I I I I I I I 
SKSALNLAADNALGNAVPVQIDLIGAASLPGDMGRVLEHYAGSGFAQLAL YNAASGKIEK 
130 140 150 160 170 180 

190 200 210 220 230 240 

SINPHQFDQPLPDKEHWEQIQQTGSVRSLESIGGVLYAQGWLSAGTHNGRDYALFFRQPI 
I I I I I :: I I I : I I : I I : I I I I I I : I I I I I I I 1 I I I 1 I I I I I I I I I I I I I M I M I : 
SINPHKLDQPFPGKARWEKIQRAGSVRDLESIGGVLYAQGWLSAGTHNGRDYALFFRQPV 

190 200 210 220 230 240 

250 260 270 280 290 300 

PENVAQDAVLIEKARAKYAELSYSKKGLQTFFLVTLLIASLLSIFLALVMALYFARRFVE 

I :: I I : I I I I I I 1 I I I I I I I I I i I I I I I I I I I I : I I I I i I I I I I I I I I I I I I I I I I I I I I 
PKGVAEDAVLIEKARAKYAELSYSKKGLQTFFLATLLIASLLSIFLALVMALYFARRFVE 
250 260 270 280 290 300 

310 320 330 340 350 360 

PILSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEAA 

I : M I I II I I I I I I I I I I I I I I I I I I I I I I I I I II i I I I I I I I I I I I I I I I I I I I I I I M 
PVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEAA 

310 320 330 340 350 360 

370 380 390 400 410 420 

RHYLECVLDGLTTGWVFDEKGRLKTFNKAAEQILGMPLAPLWGSSRHGWHGVSAQQSLL 

II I II I I I : I I I I I II I I I I : I I I I I I I I I 1 I II I I I I : I I I I I I I I I I I I I I I I I I M 
RHYLECVLEGLTTGVWFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSLL 

370 380 390 400 410 420 

430 440 450 460 470 480 

AEVFAAIGAAAGTDKPVQVEYAAPDDAKILLGKATVLPEDNGNGWMVIDDITVLIRAQK 
I I I I I I I I I I I I I I I I I : I : I I I I I I I I f I I I I I I I I I I I I I I I I I I I I I 1 I I I I I : II I 
AEVFAAIGAAAGTDKPVHVKYAAPDDAKILLGKATVLPEDNGNGWMVIDDITVLIHAQK 

430 440 450 460 470 480 

490 500 510 520 530 540 

EAAWGEVAKRLAHEIRNPLTPIQLSAERLAWKLGGKLDDQDAQILTRSTDTIIKQVAALK 
I | II I I II I I I I 1 I I I I I I I I I I I I II I I I I I 1 I I I I I : I I I I I I I I I I I I I : I I I I I I I 
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orf 64-1 EAAWGEVAKRLAHEIRNPLTPIQLSAERLAWKLGGKLDEQDAQILTRSTDTIVKQVAALK 

490 500 510 520 530 540 

550 560 570 580 590 600 

5 orf 64ng-l . pep EMVEAFRNYARAPSLKLENQDLNALIGDVLALYEAGPCRFEAELAGEPLMMAADTTAMRQ 

I I I I I I I I I I I : I I I I I I I I I I I ! I I I I I I I I I I I 1 I I I I I I I I I I I I : I I I I I I I I I 
orf 64-1 EMVEAFRNYARSPSLKLENQDLNALIGDVLALYEAGPCRFAAELAGEPLTVAADTTAMRQ 

550 560 570 580 590 600 

10 610 620 630 640 650 660 

orf 64ng-l . pep VLHNIFKNAAEAAEEADMPEVRVKSETGQDGRIVLTVCDNGKGFGKEMLHNAFEPYVTDK 

II I I III I I] I M M l!:l Ml II M I MM II Ml II I I II 111:1 II Mill IMMI 
orf 64-1 VLHNIFKNAAEAAEEADVPEVRVKSETGQDGRIVLTVCDNGKGFGREMLHNAFEPYVTDK 

610 620 630 640 650 660 

15 

670 680 690 700 

orf64ng-l.pep PAGTGLGLPWKKIIGEHGGRISLSNQDAGGACVRIILPKTVETYAX 
M M M M I I M M I M M M M M M M M M M M M M : M M 
orf 64-1 PAGTGLGLPWKKIIEEHGGRISLSNQDAGGACVRIILPKTVKTYAX 
20 670 680 690 700 

Furthermore, ORF64ng-l shows significant homology to a protein from Acaulinodans: 

sp|Q04850!NTRY_AZOCA NITROGEN REGULATION PROTEIN NTRY >gi | 77479 | pir || S18624 ntrY 
protein - Azorhizobium caulinodans >gi 138737 (X63841) NtrY gene product 
[Azorhizobium caulinodans] Length « 771 
25 Score = 218 bits (550) , Expect - 7e-56 

Identities - 195/720 (27%) , Positives = 320/720 (44%) , Gaps = 58/720 (8%) 





Query: 


7 


IAAICAWLLYGLTAATGSTSSLADYFWWIXXXXXXXXXXXXXXXXRYVILLLKDRRNGV 


66 


30 






I+A+ ++L GLT + + + R++KRG 




Sbjct: 


35 


I S ALAT FLI LMGLT PWPTHQWI S— VLLVNAAAVLI LS AMVGRE IWRI AKARARGR 


90 




Query: 


67 


FGSQIAKRLSGMFTLVAVLPGLFLFGISAQFINGTINSWFGNDTHEALERSLNLSKSALD 


126 








+++ R+ G+F +V+V+P + + +++ ++ ++ WF T E + S++++++ + 




35 


Sbjct: 


91 


AAARLHIRIVGLFAWSWPAILVAWASLTLDRGLDRWFSMRTQEIVASSVSVAQTYVR 


150 




Query: 


127 


LAADNAVSNAVPVQIDLIGTASLSGNMGSVLEHYAG — SGFAQLALYNAASGKIEKSINP 


184 








AN+ + +DL S+ YGSFQ+ AA + ++ 






Sbjct: 


151 


EHALNIRGDILAMSADLTRLKSV YEGDRSRFNQILTAQAALRNLPGAMLI 


200 


40 


Query: 


185 


HQFDQPLPDKEHWEQIQQTGSVRSLESIGGVLYAQGWLSAGTHNGRDYA 


233 








+ D++++ 1+ V+ +IG Q + N DY 






Sbjct: 


201 


RR-DLSVVERAN-VNIGREFIVPANLAIGDATPDQPVIYLP — NDADYVAAWPLKDYDD 


256 


45 


Query: 


234 


— LFFRQP I PEN VAQDAVLI EKARAKYAELS YSKKGLQT FFLVTXXXXXXXXXXXXXVMA 


291 






L++IV ++AYL+ G+Q F + + 






Sbjct: 


257 


LYLYVARLIDPRVIGYLKTTQETLADYRSLEERRFGVQVAFALMYAVITLIVLLSAVWLG 


316 




Query: 


292 


LYFARRFVEPILSLAEGAKAVAQGDFSQTRPVLRND-EFGRLTKLFNHMTEQLSIXXXXX 


350 


50 






L F++ V PI L A VA+G+ P+ R + + L + FN MT +L 




Sbjct: 


317 


LNFSKWLVAPIRRLMSAADHVAEGNLDVRVPIYRAEGDLASLAETFNKMTHELRSQREAI 


376 




Query: 


351 


XXXXXXXXXXXHYLECVLDGLTTGVWFDEKGRLKTFNKAAEQILGMPLAPLWGSSRHGW 


410 








+ E VL G+ GV+ D + R+ N++AE++LG L+ + RH 




55 


. Sbjct: 


377 


LTARDQIDSRRRFTEAVLSGVGAGVIGLDSQERITILNRSAERLLG — LSEVEALHRHLA 


434 




Query: 


411 


HGVSAQQSLLAEVFXXXXXXXXTDKPVQVEYAAPDDAKILLGKATVLPEDNG NGWM 


467 








V LL E + VQ D + + V E + +G V+ 






Sbjct: 


435 


EWPETAGLLEEA — EHARQRSVQGNITLTRDGRERVFAVRVTTEQSPEAEHGWW 


488 


60 


Query: 


468 


VIDDITVLIRAQKEAAWGEVAKRLAHEIRNPLTPIQLSAERLAWKLGGKLDDQDAQILTR 


527 








+DDIT LI AQ+ +AW +VA+R+AHEI+NPLTPIQLSAERL K G + QD +1 + 






Sbjct: 


489 


TLDDITELISAQRTSAWADVARRIAHEIKNPLTPIQLSAERLKRKFGRHV-TQDREIFDQ 


547 


65 


Query: 


528 


STDTIIKQVAALKEMVEAFRNYARAPSLKLENQDLNALIGDVLALYEAGPCRFEAELAGE 


587 






TDTII+QV + MV+ F ++AR P +++QD++ +1 + L G + 






Sbjct: 


548 


CTDTIIRQVGDIGRMVDEFSSFARMPKPWDSQDMSEIIRQTVFLMRVGHPEWFDSEVP 


607 




Query: 


588 


PLMMAA- DTTAMRQVLHN I FKNXXXXXXXXDMPEVRVK SETGQDGRIVLTVCD 


639 


70 






PMA D +QLNIKN P+VR + + G+D +V+ + D 




Sbjct: 


608 


PAMPARFDRRLVSQALTNILKNAAEAIEAVP-PDVRGQGRIRVSANRVGED — LVIDIID 


664 
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Query: 640 NGKGFGKEMLHNAFEPYVTDKPAGTGLGLPWKKIIGEHGGRISLSNQDAG-GACVRIIL 698 

NG G +E + EPYVT + GTGLGL +V KI+ EHGG I L++ G GA +R+ L 
Sbjct: 665 NGTGLPQESRNRLLEPYVTTREKGTGLGLAIVGKIMEEHGGGIELNDAPEGRGAWIRLTL 724 



Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 31 

The following partial DNA sequence was identified in N meningitidis <SEQ ID 259>: 

1 ATGTACGCAT TTACCGCCGC ACAGCAACAG AAGGCACTCT TCCGGCTGGT 

51 GCTTTTTCAT ATCCTCATCA TCGCCGCCAG CAACTATCTG GTGCAGTTCC 

101 CTTTCCAAAT TTTCGGCATC CACACCACTT GGGGCGCATT TTCCTTTCCC 

151 TTCATCTTCC TTGCCACCGA CCTGACCGTC CGCATTTTCG GTTCTCACTT 

201 GGCACGGCGG ATTATCTTTT GGGTGATGTT CCCCGCCCTT TTGCTTTCCT 

251 ACGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACAGG CTTGGGCGCG 

301 CTGTCCGAAT TCAACACCTT TGTCGGACGC ATCGCCTTAG CCAGCTTTGC 

351 CGCCTACGCG ATCGGACAAA TCCTTGATAT TTTTGTATTC AACAAATTAC 

401 GCCGTCTGAA AGCGTGGTGG ATTGCACCGA ACGCATCAAC CGTCATCGGG 

451 CACGCGTTGG ATACG. . . 

This corresponds to the amino acid sequence <SEQ ID 260; ORF66>: 

1 MYAFTAAQQQ KALFRLVLFH ILIIAASNYL VQFPFQIFGI HTTWGAFSFP 

51 FIFLATDLTV RIFGSHLARR IIFWVMFPAL LLSYVFSVLF HNGSWTGLGA 

101 LSEFNTFVGR IALASFAAYA IGQILDIFVF NKLRRLKAWW IAPNASTVIG 

151 HALDT... 

Further work revealed the complete nucleotide sequence <SEQ ID 26 1>: 

1 ATGTACGCAT TTACCGCCGC ACAGCAACAG AAGGCACTCT TCCGGCTGGT 

51 GCTTTTTCAT ATCCTCATCA TCGCCGCCAG CAACTATCTG GTGCAGTTCC 

101 CTTTCCAAAT TTTCGGCATC CACACCACTT GGGGCGCATT TTCCTTTCCC 

151 TTCATCTTCC TTGCCACCGA CCTGACCGTC CGCATTTTCG GTTCTCACTT 

201 GGCACGGCGG ATTATCTTTT GGGTGATGTT CCCCGCCCTT TTGCTTTCCT 

251 ACGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACAGG CTTGGGCGCG 

301 CTGTCCGAAT TCAACACCTT TGTCGGACGC ATCGCCTTAG CCAGCTTTGC 

351 CGCCTACGCG ATCGGACAAA TCCTTGATAT TTTTGTATTC AACAAATTAC 

401 GCCGTCTGAA AGCGTGGTGG ATTGCACCGA CCGCATCAAC CGTCATCGGC 

451 AACGCCTTGG ATACGCTGGT ATTTTTCGCC GTTGCCTTCT ACGCAAGCAG 

501 CGATGGATTT ATGGCGGCAA ACTGGCAGGG CATCGCTTTT GTCGATTACC 

551 TGTTCAAACT TACCGTCTGC ACCCTCTTCT TCCTGCCCGC CTACGGCGTG 

601 ATACTGAATC TGCTGACGAA AAAACTGACA ACCCTGCAAA CCAAACAGGC 

651 GCAAGACCGC CCCGCGCCCT CGCTGCAAAA TCCGTAA 

This corresponds to the amino acid sequence <SEQ ID 262; ORF66-l>: 

1 MYAFTAAQQQ KALFRLVLFH ILIIAASNYL VQFPFQIFGI HTTWGAFS FP 

51 FIFLATDLTV RIFGSHLARR IIFWVMFPAL LLSYVFS VLF HNGSWTGLGA 

101 LSEFNTFVGR I ALASFAAYA IGQILDIFV F NKLRRLKAWW IAPTASTVIG 

151 NALDTLVFFA VAF YASSDGF MAANWQGIAF VDYLFKLT VC TLFFLPAYGV 

201 ILNLLTKKLT TLQTKQAQDR PAPSLQNP* 



Computer analysis of this amino acid sequence gave the following results: 
Homology with the hypothetical protein o221 of E. coli (accession number P37619) 



ORF66 and o221 protein show 67% aa identity in 155aa overlap: 
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orf 66 1 MYAFTAAQQQKALFRLVLFHILIIAASNYLVQFPFQIFGIHTTWGAFSFPFIFLATDLTV 60 

M F+ Q+ KALF L LFH+L+I +SNYLVQ PIG HTTWGAFSFPFIFLATDLTV 
0221 1 MNVFSQTQRYKALFWLSLFHLLVITSSNYLVQLPVSILGFHTTWGAFSFPFIFLATDLTV 60 

orf66 61 RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 120 
RIFG+ LARRIIF VM PALL+SYV S LF+ GSW G GAL+ FN FV RIA ASF AYA 
61 R I FGAPLARR 1 1 FAVM I PALL I S YVI S S L FYMG S WQG FGAL AH FNL FVARI AT AS FMAYA 120 



o221 



orf66 



o221 



121 IGQILDIFVFNKLRRLKAWWIAPNASTVIGHALDT 155 

+GQILD+ VFN+LR+ + WW+AP AST+ G+ DT 
121 LGQILDVHVFNRLRQSRRWWLAPTASTLFGNVSDT 155 



Homology with a predicted ORF from ^meningitidis (strain A) 

ORF66 shows 96.1% identity over a 155aa overlap with an ORF (ORF66a) from strain A of N. 



meningitidis: 



orf 66. pep 
orf 66a 



10 20 30 40 50 60 

MYAFTAAQQQKALFRLVLFHILIIAASNYLVQFPFQIFGIHTTWGAFS FPFI FLATDLTV 
I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
MYAFTAAQQQKALFWLVLFHILIIAASNYLVQFPFQISGIHTTWGAFS FPFIFLATDLTV 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 66 . pep RIFGSHLAR RIIFWVMFPALLLSYVFS VLFHNGSWTGLGALSEFNTFVGRI ALASFAAYA 
I | I | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 
orf 66a RIFGSHLAR RIIFWVMFPALLLSYVFS VLFHNGSWTGLGALSEFNTFVGRI ALASFAAYA 

70 80 90 100 110 120 



130 140 150 

orf 66 . pep IGQILDIFV FNKLRRLKAWWIAPNAS TVIGHALDT 
: I I I I I I I I I I I I I I M I I I : I I : I I I I I I : I I I I 
O r f 6 6 a LGQI LDI FV FNKLRRLKAWWVAPTAS TVIGNALDTLVFFAVAF YAS S DG FMAANWQG I AF 

130 140 150 160 170 180 



orf 66a VDYLFKLT VCGLFFLPAYGVILNLL TKKLTTLQTKQAQDRPAPSLQNPX 

190 200 210 220 

The complete length ORF66a nucleotide sequence <SEQ ID 263> is: 



1 ATGTACGCAT TTACCGCCGC ACAGCAACAG AAGGCACTCT TCTGGCTGGT 

51 GCTTTTTCAT ATCCTCATCA TCGCCGCCAG CAACTATCTG GTGCAGTTCC 

101 CCTTCCAAAT TTCCGGCATC CACACCACTT GGGGCGCGTT TTCCTTTCCC 

151 TTCATCTTCC TCGCCACCGA CCTGACCGTC CGCATTTTCG GTTCGCACTT 

201 GGCACGGCGG ATTATCTTTT GGGTCATGTT CCCCGCCCTT TTGCTTTCCT 

251 ACGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACGGG CTTGGGCGCG 

301 CTGTCCGAAT TCAACACCTT TGTCGGACGC ATCGCGCTGG CAAGTTTTGC 

351 CGCCTACGCG CTCGGACAAA TCCTTGATAT TTTTGTGTTC AACAAATTAC 

401 GCCGTCTGAA AGCGTGGTGG GTTGCCCCGA CTGCATCAAC CGTCATCGGC 

451 AACGCCTTAG ATACGTTGGT ATTTTTCGCC GTTGCCTTCT ACGCAAGCAG 

501 CGATGGATTT ATGGCGGCAA ACTGGCAGGG CATCGCTTTT GTCGATTACC 

551 TGTTCAAACT CACCGTCTGC GGTCTGTTTT TCCTGCCCGC CTACGGCGTG 

601 ATTCTGAATC TGCTGACGAA AAAACTGACG ACCCTGCAAA CCAAACAGGC 

651 GCAAGACCGC CCCGCGCCCT CGCTGCAAAA TCCGTAA 

This encodes a protein having amino acid sequence <SEQ ID 264>: 



1 MYAFTAAQQQ KALFWLVLFH ILIIAASNYL VQFPFQISGI HTTWGAFS FP 

51 F I FLATDLTV RIFGSHLARR IIFWVMFPAL LLSYVFS VLF HNGSWTGLGA 

101 LSEFNTFVGR I ALASFAAYA LGQILDIFV F NKLRRLKAWW VAPTAS TVIG 

151 NALDTLVFFA VAF YASSDGF MAANWQGIAF VDYLFKLT VC GLFFLPAYGV 

201 ILNLL TKKLT TLQTKQAQDR PAPSLQNP* 

ORF66a and ORF66-1 show 97.8% identity in 228 aa overlap: 



10 20 30 40 50 60 

orf 66a . pep MYAFTAAQQQKALFWLVLFHILIIAASNYLVQFPFQISGIHTTWGAFSFPFI FLATDLTV 
I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I 
orf 66-1 MYAFTAAQQQKALFRLVLFHILIIAASNYLVQFPFQIFG I HTTWGAFSFPFIFLATDLTV 
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10 20 30 40 50 60 

70 80 90 100 110 120 

RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I i I I I I I I I I I I i I I I 
RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 
70 80 . 90 100 110 120 

130 140 150 160 170 180 

LGQILDIFVFNKLRRLKAWWVAPTASTVIGNALDTLVFFAVAFYASSDGFMAANWQGIAF 

: I I I I I I I I I I I I I I I I I I I :l I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
IGQILDIFVFNKLRRLKAWWIAPTASTVIGNALDTLVFFAVAFYASSDGFMAANWQGIAF 
130 140 150 160 170 180 

190 200 210 220 229 

VD YL FKLTVCGL FFL PAYGV I LNLLTKKLTTLQTKQAQDRPAP S LQN PX 

I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I 
VDYLFKLTVCTLFFLPAYGVILNLLTKKLTTLQTKQAQDRPAPSLQNPX 
190 200 210 220 

Homology with a predicted ORF from N, gonorrhoeae 

ORF66shows 94.2% identity over a 155aa overlap with a predicted ORF (ORF66.ng) from N. 
gonorrhoeae: 

orf 66 . pep MYAFTAAQQQKALFRLVLFHILI IAASNYLVQFPFQI FGIHTTWGAFSFPFI FLATDLTV 60 

I ! I : I I I I I I I I I I I I I I I M I I I I I I I I II I I I I : I I I I I M I M I M I I I I M I I I I I 
orf66ng MYALTAAQQQKALFRLVLFHILIIAASNYLVQFPFRI FGIHTTWGAFSFPFI FLATDLTV 60 

orf 66 .pep RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 120 

I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I II I I I M II I 
orf66ng RIFGSHLARRIIFWVMFPALSLSYVFSVLFHNGSWTGLGAPSQFNTFVGRIALASFAAYA 120 

orf 66 . pep IGQILDIFVFNKLRRLKAWWIAPNASTVIGHALDT 155 

: I I I I I I M I : I I I I I I I I I I I I I I I I I I : I I I I 
o r f 6 6ng LGQ I LD I FVFDKLRRLKAWW I APAAST VI GNAL DTLVFFAVAF YAS S DE FMAANWQG I AF 180 

The complete length ORF66ng nucleotide sequence <SEQ ID 265> is: 

1 ATGTACGCAT TGACCGCCGC ACAGCAACAG AAGGCACTCT TCCGGCTGGT 

51 GCTTTTCCAT ATCCTCATCA TCGCCGCCAG CAACTATCTG GTGCAGTTCC 

101 CCTTCCGGAT TTTCGGCATC CACACCACTT GGGGCGCGTT TTCCTTTCCC 

151 TTCATCTTCC TCGCCACCGA CCTGACCGTC CGCATTTTCG GTTCGCACTT 

201 GGCGCGGCGG ATTATCTTTT GGGTGATGTT CCCCGCCCTT ttgCTTTcat 

251 aCGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACGGG CTTGGGCGCG 

301 ctgTCCCAAT TCAACACCTT TGTCGGACGC ATCGCGCTGG CAAGTTTTGC 

351 CGCCTACGCG CTCGGACAAA TCCTTGATAT TTTCGTATTC GACAAATTAC 

401 GCCGTCTGAA AGCGTGGTGG ATTGCCCCGG CCGCATCAAC CGTCATCGGC 

451 AATGCACTGG ACACGTTAGT ATTTTTTGCC GTTGCCTTTT ACGCAAGCAG 

501 CGATGAATTT ATGGCGGCAA ACTGGCAGGG CATCGCTTTT GTCGATTACC 

551 TGTTCAAACT TACCGTCTGC ACCCTCTTCT TCCTGCCCGC CTACGGCGTG 

601 ATACTGAATC TGCTGACGAA AAAACTGACG GCCCTGCAAA CCAAACAGGC 

651 GCAAGACCGC CCCGTGCCCT CGCTGCAAAA TCCGTAA 

This encodes a protein having amino acid sequence <SEQ ID 266>: 

1 MYALTAAQQQ KALFRLVLFH ILIIAASNYL VQFPFRIFGI HTTWGAFSFP 

51 FI FLATDLTV R IFGSHLARR IIFWVMFPAL SLSYVFSVLF HNGSWTGLGA 

101 PSQ FNTFVGR IALASFAAYA LGQILDIFVF DKLRRLKAWW IAPA ASTVIG 

151 NALDTLVFFA VA FYASSDEF MAANWQGI AF VDYLFKLTVC T LFFL PAYGV 

201 ILNLLTKKLT ALQTKQAQDR PVPSLQNP* 

An alternative annotated sequence is: 

1 MYALTAAQQQ KALFRLVLFH ILIIAASNYL VQFPFRIFGI HTTWGAFS FP 

51 FI FLATDLTV RIFGSHLARR IIFWVMFPAL LLSYVFSV LF HNGSWTGLGA 

101 LSQFNTFVGR I ALASFAAYA LGQILDIFV F DKLRRLKAWW IAPAAS TVIG 

151 NALDTLVFFA VAF YASSDEF MAANWQGIAF VDYLFKLT VC TLFFL PAYGV 

201 ILNLL TKKLT ALQTKQAQDR PVPSLQNP* 



orf 66a. pep 
orf66-l 

orf 66a. pep 
orf66-l 

orf 66a. pep 
orf66-l 
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ORF66ng and ORF66-1 show 96.1% identity in 228 aa overlap: 

orf 66-1 . pep MYAFTAAQQQKALFRLVLFHILIIAASNYLVQFPFQIFGIHTTWGAFSFPFIFLATDLTV 60 

Ml: M II II j I I II M II I II III Ml Ml I M 1:1 M I Mill I I I I I I Ml I MM! 
orf66ng MYALTAAQQQKALFRLVLFHILIIAASNYLVQFPFRIFGIHTTWGAFSFPFIFLATDLTV 60 

orf 66-1 .pep RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 120 

I II M M M M M M M M M 1 M M I M I M M M M M M : M M M M M M M M I 
orf66ng RI FGSHLARRI I FWVMFPALLLS YVFSVLFHNGSWTGLGALSQFNTFVGRIALAS FAAYA 120 

10 orf 66-1 .pep IGQILDIFVFNKLRRLKAWWIAPTASTVIGNALDTLVFFAVAFYASSDGFMAANWQGIAF 180 

: I I I I I II I I : I I I I I M I I I I I • I I I I I I I ! I I I I I I I I I I I I I I I I M M M M M I 
orf66ng LGQILDIFVFDKLRRLKAWWIAPAASTVIGNALDTLVFFAVAFYASSDEFMAANWQGIAF 180 

orf 66-1 . pep VDYLFKLTVCTLFFLPAYGVILNLLTKKLTTLQTKQAQDRPAPSLQNPX 229 

15 lllllllllllllllllllllllllllllhlllllllllhlllllll 

orf 66ng VDYLFKLTVCTLFFLPAYGVILNLLTKKLTALQTKQAQDRPVPSLQNPX 22 9 

Furthermore, ORF66ng shows significant homology with an E.coli ORF: 

sp|P37619[YHHQ_ECOLI HYPOTHETICAL 25.3 KD PROTEIN IN FTSY-NIKA INTERGENIC 
REGION (0221) 

20 >gi|1073495|pir| IS47690 hypothetical protein o221 - Escherichia coli >gi|466607 

(U00039) No definition line found [Escherichia coli] >gi 1 1789882 (AE000423) 
hypothetical 25.3 kD protein in ftsY-nikA intergenic region [Escherichia coli] 
Length - 221 
Score = 273 bits (692), Expect » 5e-73 

25 Identities - 132/203 (65%), Positives = 155/203 (76%) 



30 



Query: 1 MYALTAAQQQKALFRLVLFHILIIAASNYLVQFPFRIFGIHTTWGAFSFPFIFLATDLTV 60 

M + Q+ KALF L LFH+L+I +SNYLVQ PIG HTTWGAFS FPFI FLAT DLTV 
Sbjct: 1 MNVFSQTQRYKALFWLSLFHLLVITSSNYLVQLPVS I LGFHTTWGAFS FPFI FLAT DLTV 60 

Query: 61 RI FGSHLARRI I FWVMFPALLLSYVFSVLFHNGSWTGLGALSQFNTFVGRI ALAS FAAYA 120 

RIFG+ LARRIIF VM PALL+SYV S LF+ GSW G GAL+ FN FV RIA ASF AYA 
Sbjct: 61 RI FGAPLARR 1 1 FAVMI PALL I S YVI S SLFYMG S WQG FGALAH FN LFVARI AT AS FMAYA 120 

35 Query: 121 LGQILDIFVFDKLRRLKAWWIAPAASTVIGNALDTLVFFAVAFYASSDEFMAANWQGIAF 180 

LGQILD+ VF++LR+ + WW+AP AST+ GN DTL FF +AF+ S D FMA +W IA 
Sbjct: 121 LGQILDVHVFNRLRQSRRWWLAPTASTLFGNVSDTLAFFFIAFWRSPDAFMAEHWMEIAL 180 

Query: 181 VDYLFKLTVCTLFFLPAYGVILN 203 
40 VDY FK+ + +FFLP YGV+LN 

Sbjct: 181 VDYCFKVLI S I VFFLPMYGVLLN 203 

Based on this analysis, including the homology with the E.coli protein and the presence of several 
putative transmembrane domains in the gonococcal protein, it is predicted that these proteins from 
45 Kmeningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 

Example 32 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 267>: 

1 ATGGTCATAA AATATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

50 51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAAyGCA GTmwrAATAT 

101 CTGAAACTGT TTCAGTTGAT ACCGGACAAG GTGCGAAAAT TCATAAGTTT 

151 GTACCTAAAA ATAGTAAAAC TTATTCATCT GATTTAATAA AAACGGTAGA 

201 TTTAACACAC AyyCCTACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGG CGGGGGTCGG CAAACTTGCC 

55 301 CGCTTAGgCG CGAAATTCAG CACAAGGGCG GTtCCCTATG TCGGAACAGC 

351 CcTTTTAGCC CACGACGTAT ACGAAAcTTT CAAAGAAGAC ATACAGGCAC 

401 GAGGCTACCA ATACGACCCC GAAACCGACA AATTTGTAAA AGGCTACGAA 

451 TATAGTAATT GCCTTTGGTA CGAAGACAAA AGACGTATTA ATAGAACCTA 
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501 TGGCTGCTAC GGCGTTGAT. . 

This corresponds to the amino acid sequence <SEQ ID 268; ORF72>: 

1 MVIKYTNLNF AKLSIIAILM MYSFEANANA VXISETVSVD TGQGAKIHKF 

51 VPKNSKTYSS DLIKTVDLTH XPTGAKARIN AKITASVSRA GVLAGVGKLA 

101 RLGAKFSTRA VPYVGTALLA HDVYETFKED IQARGYQYDP ETDKFVKGYE 

151 YSNCLWYEDK RRINRTYGCY GVD. . 

Further work revealed the complete nucleotide sequence <SEQ ID 269>: 



1 ATGGTCATAA AATATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAATGCA GTAAAAATAT 

101 CTGAAACTGT TTCAGTTGAT ACCGGACAAG GTGCGAAAAT TCATAAGTTT 

151 GTACCTAAAA ATAGTAAAAC TTATTCATCT GATTTAATAA AAACGGTAGA 

201 TTTAACACAC ATCCCTACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGG CGGGGGTCGG CAAACTTGCC 

301 CGCTTAGGCG CGAAATTCAG CACAAGGGCG GTTCCCTATG TCGGAACAGC 

351 CCTTTTAGCC CACGACGTAT ACGAAACTTT CAAAGAAGAC ATACAGGCAC 

401 GAGGCTACCA ATACGACCCC GAAACCGACA AATTTGCAAA GGTCTCAGGC 

451 TAA 

This corresponds to the amino acid sequence <SEQ ID 270; ORF72-l>: 



1 MVIKYTNLNF AKLSIIAILM MYSFEANA NA VKISETVSVD TGQGAKIHKF 
51 VPKNSKTYSS DLIKTVDLTH IPTGAKARIN AKITASVSRA GVLAGVGKLA 
101 RLGAKFSTRA VPYVGTALLA HDVYETFKED IQARGYQYDP ETDKFAKVSG 
151 * 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF72 shows 98.0% identity over a 147aa overlap with an ORF (ORF72a) from strain A of K 
meningitidis: 

10 20 30 40 50 60 

or f 72 . pep MVIKYTNLNFAKLSIIAILMMYSFEANAN AVXISETVSVDTGQGAKIHKFVPKNSKTYSS 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf72a MVIKYTNLNFAKLSIIAILMMYSFEANA NAVKISETVSVDTGQGAKIHKFVPKNSKTYSS 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 72 . pep DLIKTVDLTHXPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 
I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 72a DLIKTVDLTH I PTGAKARINAKIT AS VSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 

70 80 90 100 110 120 



130 v 140 150 160 170 

or f 72 . pep HDVYETFKEDIQARGYQYDPETDKFVKGYEYSNCLWYEDKRRINRTYGCYGVD 

II I I I I I I I M 1 I I i I II I II I I I! : I 
orf 72a HDVYETFKEDIQARGYQYDPETDKFAKVSGX 

130 140 150 

The complete length ORF72a nucleotide sequence <SEQ ID 27 1> is: 



1 ATGGTCATAA AATATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAATGCA GTAAAAATAT 

101 CTGAAACTGT TTCAGTTGAT ACCGGACAAG GTGCGAAAAT TCATAAGTTT 

151 GTACCTAAAA ATAGTAAAAC TTATTCATCT GATTTAATAA AAACGGTAGA 

201 TTTAACACAC ATCCCTACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGG CGGGGGTCGG CAAACTTGCC 

301 CGCTTAGGCG CGAAATTCAG CACAAGGGCG GTTCCCTATG TCGGAACAGC 

351 CCTTTTAGCC CACGACGTAT ACGAAACTTT CAAAGAAGAC ATACAGGCAC 

401 GAGGCTACCA ATACGACCCC GAAACCGACA AATTTGCAAA GGTCTCAGGC 

451 TAA 



This encodes a protein having amino acid sequence <SEQ ID 272>: 
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1 MVIKYTNLNF AKLSIIAILM MYSFEANANA VKISETVSVD TGQGAKIHKF 
51 VPKNSKTYSS DLIKTVDLTH IPTGAKARIN AKITASVSRA GVLAGVGKLA 
101 RLGAKFSTRA VPYVGTALLA HDVYETFKED IQARGYQYDP ETDKFAKVSG 
151 * 

ORF72a and ORF72-1 show 100.0% identity in 1 50 aa overlap: 

10 20 30 40 50 60 

MVIKYTNLNFAKLS I IAILMMYSFEANANAVKI SETVSVDTGQGAKIHKFVPKNSKTYS S 
II I Mill II I Itll II MM!!! II III II Ml! II I ! I I I M II I I I 1! IIMIII I I 
MVIKYTNLNFAKLS I IAILMMYSFEANANAVKI SET VSVDTGQGAKIHKFVPKNSKTYSS 
10 20 30 40 50 60 

70 80 90 100 110 120 

DLIKTVDLTHIPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 
DLIKTVDLTHIPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 
70 80 90 100 110 120 

130 140 150 

HDVYETFKEDIQARGYQYDPETDKFAKVSGX 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 
HDVYETFKEDIQARGYQYDPETDKFAKVSGX 

130 140 150 

Homology with a predicted ORF from N. gonorrhoeae 

ORF72 shows 89% identity over a 173aa overlap with a predicted ORF (ORF72.ng) from N. 
gonorrhoeae: 

orf 72 . pep MVIKYTNLNFAKLS I IAILMMYSFEANANAVXISETVSVDTGQGAKIHKFVPKNSKTYSS 60 

II I : I II I I I II I I I I I I I I I II I I I I I I I I I I I : I I I I I I I I I : I I I I I I : I : Ml 
orf72ng MVTKHTNLNFAKLS I IAILMMYS FEANANAVKI SETLSVDTGQGAKVHKFVPKSSNI YSS 60 

or f 72 . pep DLIKTVDLTHXPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 120 

II MIIMI I II I II I II I I I I I I I I II I I I : M I I I : I I I I I M II I I I I I I I I I I 
orf72ng DLTKAVDLTHIPTGAKARINAKITASVSRAGVLSGVGKLVRQGAKFGTRAVPYVGTALLA 120 

orf 72. pep HDVYETFKEDIQARGYQYDPETDKFVKGYEYSNCLWYEDKRRINRTYGCYGVD 173 

II I I I I II I II M I I :\ II I I I I I II I I I I : I I M I I I : II II I I I I I I I I I 
orf72ng HDVYETFKEDIQARGCRYDPETDKFVKGYEYANCLWYEDERRINRTYGCYGVDSSIMRLM 180 

An ORF72ng nucleotide sequence <SEQ ID 273> was predicted to encode a protein having amino 
acid sequence <SEQ ID 274>: 

1 MVTKHTNLNF AKLSIIAILM MYSFEANA NA VKISETLSVD TGQGAKVHKF 

51 VPKSSNIYSS DLTKAVDLTH IPTGAKARIN AKITASVSRA GVLSGVGKLV 

101 RQGAKFGTRA VPYVGTALLA HDVYETFKED IQARGCRYDP ETDKFVKGYE 

151 YANCLWYEDE RRINRTYGCY GVDSSIMRLM PDRSRFPEVK QLMESQMYRL 

201 ARPFWNWRKE ELNKLSSLDW NNFVLNRCTF DWNGGGCAVN KGDDFRAGAS 

251 FSLGRNPKYK EEMDAKKPEE ILSLKVDADP DKYIEATGYP GYSEKVEVAP 

301 GTKVNMGPVT DRNGNPVQVA ATFGRDAQGN TTADVQVI PR PDLTPASAEA 

351 PHAQPLPEVS PAENPANNPD PDENPGTRPN PEPDPDLNPD ANPDTDGQPG 

401 TSPDSPAVPD RPNGRHRKER KEGEDGGLSC DYFPEILACQ EMGKPSDRMF 

451 HDISIPQVTD DKTWSSHNFL PSNGVCPQPK TFHVFGRQYR ASYEPLCVFA 

501 EKIRFAVLLA FIIMSAFWF GSLGGE* 



After further analysis, the following gonococcal DNA sequence <SEQ ID 275> was identified: 

1 ATGGTCACAA AACATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAATGCA GTAAAAATAT 

101 CTGAAACTCT TTCGGTTGAT ACCGGACAAG GCGCGAAAGT TCATAAGTTC 

151 GTTCCTAAAT CAAGTAATAT TTATTCATCT GATTTAACAA AAGCGGTAGA 

201 TTTAACGCAT ATCCCCACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGT CGGGGGTCGG CAAACTTGTC 

301 CGCCAAGGCG CGAAATTCGG CACAAGGGCG GTTCCCTATG TCGGAACAGC 

351 CCTTTTAGCC CACGACGTAT ACGAAACTTT CAAAGAAGAC ATACAGGCAC 

401 GAGGCTGCCG ATACGATCCC GAAACCGACA AATTT 



orf 72a. pep 
orf72-l 

orf72a.pep 
orf72-l 

orf72a.pep 
orf72-l 
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This corresponds to the amino acid sequence <SEQ ID 276; ORF72ng-l>: 

1 MVTKHTNLNF AKLSIIAILM MYSFEANA NA VKISETLSVD TGQGAKVHKF 
51 VPKSSNIYSS DLTKAVDLTH IPTGAKARIN AKITASVSRA GVLSGVGKLV 
101 RQGAKFGTRA VPYVGTALLA HDVYETFKED IQARGCRYDP ETDKF 

ORF72ng-l and ORF721-1 show 89.7% identity in 145 aa overlap: 

10 20 30 40 50 60 

orf72ng-l.pe MVTKHTNLNFAKLSIIAILMMYSFEANANAVKISETLSVDTGQGAKVHKFVPKSSNIYSS 
II I : I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I : I I I I I I : I : III 
orf72-l MVI KYTNLN FAKLS 1 1 AI LMMYS FE ANAN AVKI S ETVS VDTGQGAKI HKFV PKN SKT YS S 

10 20 30 40 50 60 

70 80 90 100 110 120 

DLTKAVDLTHIPTGAKARINAKITASVSRAGVLSGVGKLVRQGAKFGTRAVPYVGTALLA 
II I : I I I I I I I I II I 11 I I I II I I I I I I I 1 I I : I I I j I : I II I I : I I I I I I I I I I I I I 
DLIKTVDLTHIPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 
70 80 90 100 110 120 

130 140 
orf72ng-l.pe HDVYETFKEDIQARGCRYDPETDKF 
I I I I I I I I I I I M I I : I I I I I I II 
O r f 7 2 - 1 HDV YET FKE D I QARG YQ YD PET DKFAKV S GX 

130 140 150 



Based on this analysis, including the presence of a putative leader sequence and transmembrane 
domains in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
N.gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



orf72ng-l.pe 
orf72-l 



Example 33 

The following partial DNA sequence was identified in N .meningitidis <SEQ ID 277>: 

1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAGATTAT 

51 GTCGATTGTG TGGGTTGCCG ATTGGCTGGG CGGCGGCTGG ACGTTGTTTT 

101 TGATGGCGGC AGGTTTTGCC GCCGGCGTGC TGATGCTCAG GCAAACCGGG 

151 gCTGACCGGT CTTTTATTGG CGGGCGCGGC AATGAGAAGC GGCGGGAAGG 

201 TATCCGTTTA TCAGATGTTG TGGCCTATC. . 

This corresponds to the amino acid sequence <SEQ ID 278; ORF73>: 

1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAAGFA AGVLMLRQTG 
51 LTGLLLAGAA MRSGGKVSVY QMLWPI . . 

Further work revealed the complete nucleotide sequence <SEQ ED 279>: 

1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAGATTAT 

51 GTCGATTGTG TGGGTTGCCG ATTGGCTGGG CGGCGGCTGG ACGTTGTTTT 

101 TGATGGCGGC AGGTTTTGCC GCCGGCGTGC TGATGCTCAG GCATACGGGG 

151 CTGTCCGGTC TTTTATTGGC GGGCGCGGCA ATGAGAAGCG GCGGGAGGGT 

201 ATCCGTTTAT CAGATGTTGT GGCCTATCCG TTATACGGTG GCGGCTGTGT 

251 GTCTGATGAG TCCGGGATTC GTATCCTCGG TGTTGGCGGT ATTGCTGCTG 

301 CTGCCGTTTA AGGGAGGGGC AGTGTTGCAG GCAGGAGGTG CGGAAAATTT 

351 TTTCAACATG AACCAATCGG GCAGAAAAGA GGGCTTTTCC CGCGATGACG 

401 ATATTATCGA GGGAGAATAT ACGGTTGAAG AGCCTTACGG CGGCAATCGT 

451 TCCCGAAACG CCATCGAACA CAAAAAAGAC GAATAA 

This corresponds to the amino acid sequence <SEQ ID 280; ORF73-l>: 

1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAAGFA AGVLMLRHTG 
51 LSGLLLAGAA MRSGGRVSVY QMLWPIRYTV AAVC LMSPGF VSSVLAVLLL 
101 LPFKGGAVLQ AGGAENFFNM NQSGRKEGFS RDDDIIEGEY TVEEPYGGNR 
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151 SRNAIEHKKD E* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF73 shows 90.8% identity over a 76aa overlap with an ORF (ORF73a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 73 . pep MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAAGFA AGVLMLRQTGLTGLLLAGAA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I If : I I I : I I i : I I I I I I II 
orf 7 3a MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAATFA AGWMLRHTGLSGLLLAGAA 

10 20 30 40 50 60 

70 

or f 7 3 . pep MRSGGKVSVYQMLWPI 
11111:1111 Mil 

orf 73a MRSGGRVSVYXMLWXIRYTVAAVC XMSPGFVS5VXAVLLXL PFKGGAVLQAGGAENFFNM 

The complete length ORF73a nucleotide sequence <SEQ ID 28 1> is: 

1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAGATTAT 

51 GTCGATTGTG TGGGTTGCCG ATTGGTTGGG CGGCGGTTGG ACGCTGTTTC 

101 TAATGGCGGC AACCTTTGCC GCCGGCGTGG TGATGCTCAG GCATACGGGG 

151 CTGTCCGGTC TTTTATTGGC GGGCGCGGCA ATGAGAAGCG GCGGGAGGGT 

201 ATCCGTTTAT CANATGTTGT GGCNTATCCG TTATACGGTG GCGGCGGTGT 

251 GTCNGATGAG TCCGGGATTC GTATCCTCGG TGTNGGCGGT ATTGCTGNTG 

301 CTNCCGTTTA AGGGAGGTGC AGTGTTGCAG GCAGGAGGTG CGGAAAATTT 

351 TTTCAACATG AACCANTCGG GCAGAAAAGA NGGCNTTTCC CGCGATGACG 

401 ATATTATCGA GGGGGAATAT ACGGTTGAAG ANCCTTACGG CGGCANTCGT 

451 TTCCGAAACG CCNTNGAACA CAAAAAAGAC GAATAA 

This encodes a protein having amino acid sequence <SEQ ID 282>: 

1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAATFA AGWMLRHTG 

51 LSGLLLAGAA MRSGGRVSVY XMLWXIRYTV AAVC XMSPGF VSSVXAVLLX 

101 LPFKGGAVLQ AGGAENFFNM NXSGRKXGXS RDDDIIEGEY TVEXPYGGXR 

151 FRNAXEHKKD E* 

ORF73a and ORF73-1 show 91.3% identity in 161 aa overlap 



10 20 30 40 50 60 

orf 73a . pep MRFFGIGFLVLLFLEIMSIWVADWLGGGWTLFLMAATFAAGVVMLRHTGLSGLLLAGAA 
I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I II I I II I 
orf 7 3-1 MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAAGFAAGVLMLRHTGLSGLLLAGAA 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 73a . pep MRSGGRVSVYXMLWXIRYTVAAVCXMSPGFVSSVXAVLLXLPFKGGAVLQAGGAENFFNM 
I I I 1 1 I I I I I III Miltllll I I I I I I I I I MM I I I I 1 I f 1 1 1 I I I K I I I I I 1 
orf 73-1 MRSGGRVSVY QMLW P IRYTVAAVCLMS PGFVS S VLAVLLLLP FKGGAVLQAGGAEN FFNM 

70 80 90 100 110 120 



130 140 150 160 

orf 73a . pep NXSGRKXGXSRDDDIIEGEYTVEXPYGGXRFRNAXEHKKDEX 
I II II I I M I II I II I II I I I I I I I II I II I II I I 
orf 73-1 NQSGRKEGFSRDDDIIEGEYTVEEPYGGNRSRNAIEHKKDEX 

130 140 150 160 



Homology with a predicted ORF from K gonorrhoeae 

ORF73 shows 92.1% identity over a 76aa overlap with a predicted ORF (ORF73.ng) from N. 
gonorrhoeae: 



or f 7 3 . pep MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAAGFAAGVLMLRQTGLTGLLLAGAA 60 
I I I M I I I I M I I I I I I I M I I M I I M I I I M I M I M II I II M M II M I II I II I 
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MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAATFAAGVLMLRHTGLSGLLLAGAA 



orf73ng 
orf73.pep 
orf73ng 

The complete length ORF73ng nucleotide sequence <SEQ ID 283> is 



MRSGGKVSVYQMLWPI 
: : I : I I I I I I I I I I I I 

VKSSGKVSVYQMLWPIRYTVAAVCLMSPGFVSSVLAVLLLLPFKGGAVLQAGGAENFFNM. 



60 
76 
120 



1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAAATTAT 

51 GTCGATTGTG TGGGTTGCCG ATTGGCTGGG CGGCGGTTGG AcgcTGTTTC 

101 TAATGGCGGC AAGCTTTGCC GCCGGTGTGC TGATGCTCAG GCATAcggGG 

10 151 CTGTCCGGTC TTTTATTGGC TGGCGCGGCG GTAAAAagta gtgGGAAGGT 

201 ATCTGTTTAT CagatgtTGT GGCCTATCCG TTATAcggtg gcggcggtgT 

251 GTCTGatgag tCcggGATTC GTATCCTccg tgttggCGGT ATTGCTGCTG 

301 CTGCcgttta aggGaggGgc agtgttgcag gcaggaggtg cggaaaATTT 

351 TTTCAACATg aaCcaatcgg gcagaaAaga gggatttttc cacgatgacg 

15 401 atattatcga gggagaatat acggttgaaa aacctgacgg cggcaatcgt 

451 tcccgaAAcg ccatcgaaca cgaaaAagac gaataA 

This encodes a protein having amino acid sequence <SEQ ID 284>: 



1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAATFA AGVLMLRHTG 

51 LSGLLLAGAA VKSSGKVSVY QMLWPIRYTV AAVC LMSPGF VSSVLAVLLL 

20 101 LPFKGGAVLQ AGGAENFFNM NQSGRKEGFF HDDDIIEGEY TVEKPDGGNR 

151 SRNAIEHEKD E* 

ORF73ng and ORG73-1 show 93.8% identity in 161 aa overlap 

10 20 30 40 50 60 

MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAAGFAAGVLMLRHTGLSGLLLAGAA 

1 1 1 1 j I j 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 J 1 1 i 1 1 J 1 1 J f I 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 

MRFFG I GFLVLLFLE IMS I VWVADWLGGGWTLFLMAATFAAGVLMLRHTG LSGLLLAGAA 
10 20 30 40 50 60 



orf73-l.pep 

25 

orf73ng 



70 80 90 100 110 120 

30 orf 73-1 .pep MRSGGRVSVYQMLWPIRYTVAAVCLMSPGFVSSVLAVLLLLPFKGGAVLQAGGAENFFNM 

:: I: I : I I I I ! I I! I I I I M I i I I I! I I I I I ! I M I ! I I I I I I I I M I I I M I I I I 1 I I I 
orf73ng VKSSGKVSVYQMLWPIRYTVAAVCLMSPGFVSSVLAVLLLLPFKGGAVLQAGGAENFFNM 

70 80 90 100 110 120 

35 130 140 150 160 

or f 73-1 . pep NQSGRKEGFSRDDDIIEGEYTVEEPYGGNRSRNAIEHKKDEX 
I I I I I I I I I : I I I I I I I I I I I I : I I I I I I I I I I I I : I I I I 
orf73ng NQSGRKEGFFHDDDIIEGEYTVEKPDGGNRSRNAIEHEKDEX 
130 140 150 160 

40 Based on this analysis, including the presence of a putative leader sequence and putative 
transmembrane domain in the gonococcal protein, it is predicted that the proteins from 
N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 



Example 34 

45 The following partial DNA sequence was identified in ^meningitidis <SEQ ID 285>: 

1 ATGTTTGTTT TTCAGACGGC ATTCTT.ATG TTTCAGAAAC ATTTGCAGAA 

51 AGCCTCCGAC AGCGTCGTCG GAGGGACATT ATACGTGGTT GCCACGCCCA 

101 TCGGCAATTT GGCGGACATT ACCCTGCGCG CTTTGGCGGT ATTGCAAAAG 

151 GCG. GCCGA AGACACGCGC GTTACCGCAC AGCTTTTGAG 

50 201 CGCGTACGGC ATTCAGGGCA AACTCGTCAG TGTGCGCGAA CACAACGAAC 

251 GGCAGATGGC GGACAAGATT GTCGGCTATC TTTCAGACGG CATGGTTGTG 

301 GCACAGGTTT CCGATGCGGG TACGCCGGCC GTGTGCGACC CGGGCGCGAA 

351 ACTCGCCCGC CGCGTGCGTG AGGCCGGGTT TAAAGTCGTT CCCGTCGTGG 

401 GCGCAAC.GC GGTGATGGCG GCTTTGAGCG TGGCCGGTGT GGAAGGATCC 

55 451 GATTTTTATT TCAACGGTTT TGTACCGCCG AAATCGGGAG AACGCAGGAA 
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501 ACTGTTTGCC AAATGGGTGC GGGCGGCGTT TCCTATCGTC ATGTTTGAAA 

551 CGCCGCACCG CATCGGTGCA GCGCTTGCCG ATATGGCGGA ACTGTTCCCC 

601 GAACGCCGAT TAATGCTGGC GCGCGAAATT ACGAAAACGT TTGAAACGTT 

651 CTTAAGCGGC ACGGTTGGGG AAATTCAGAC GGCATTGTCT GCCGACGGCG 

701 ACCAATCGCG CGGCGAGATG GTGTTGGTGC TTTATCCGGC GCAGGATGAA 

751 AAACACGAAG GCTTGTCCGA GTCCGCGCAA AACATCATGA AAATCCTCAC 

801 AGCCGAGCTG CCGACCAAAC AGGCGGCGGA GCTTGCTGCC AAAATCACGG 

851 GCGAGGGAAA GAAAGCTTTG TACGAT. . 

This corresponds to the amino acid sequence <SEQ ID 286; ORF75>: 



1 MFVFQTAFXM FQKHLQKASD SWGGTLYW ATPIGNLADI TLRALAVLQK 

51 A AEDTR VTAQLLSAYG IQGKLVSVRE HNERQMADKI VGYLSDGMW 

101 AQVSDAGTPA VCDPGAKLAR RVREAGFKW PWGAXAVMA ALSVAGVEGS 

151 DFYFNGFVPP KSGERRKLFA KWVRAAFPIV MFETPHRIGA ALADMAELFP 

201 ERRLMLAREI TKTFETFLSG TVGEIQTALS ADGDQSRGEM VLVLYPAQDE 

251 KHEGLSESAQ NIMKILTAEL PTKQAAELAA KITGEGKKAL YD. . 

Further work revealed the complete nucleotide sequence <SEQ ID 287>: 



1 ATGTTTCAGA AACATTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCGGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATCTGTGC CGAAGACACG 

151 CGCGTTACCG CACAGCTTTT GAGCGCGTAC GGCATTCAGG GCAAACTCGT 

201 CAGTGTGCGC GAACACAACG AACGGCAGAT GGCGGACAAG ATTGTCGGCT 

251 ATCTTTCAGA CGGCATGGTT GTGGCACAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GTGAGGCCGG 

351 GTTTAAAGTC GTTCCCGTCG TGGGCGCAAG CGCGGTGATG GCGGCTTTGA 

401 GCGTGGCCGG TGTGGAAGGA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

451 CCGAAATCGG GAGAACGCAG GAAACTGTTT GCCAAATGGG TGCGGGCGGC 

501 GTTTCCTATC GTCATGTTTG AAACGCCGCA CCGCATCGGT GCGACGCTTG 

551 CCGATATGGC GGAACTGTTC CCCGAACGCC GATTAATGCT GGCGCGCGAA 

601 ATTACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

651 GACGGCATTG TCTGCCGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

701 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCCGCG 

751 CAAAACATCA TGAAAATCCT CACAGCCGAG CTGCCGACCA AACAGGCGGC 

801 GGAGCTTGCT GCCAAAATCA CGGGCGAGGG AAAGAAAGCT TTGTACGATC 

851 TGGCTCTGTC TTGGAAAAAC AAATAG 

This corresponds to the amino acid sequence <SEQ ID 288; ORF75-l>: 



1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADIICAEDT 

51 RVTAQLLSAY GIQGKLVSVR EHNERQMADK IVGYLSDGMV VAQVSDAGTP 

101 AVCDPGAKLA RRVREAGFK V VPWGASAVM AALSVA GVEG SDFYFNGFVP 

151 PKSGERRKLF AKWVRAAFPI VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL SADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNIMKILTAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N .meningitidis (strain A) 

ORF75 shows 95.8% identity over a 283aa overlap with an ORF (ORF75a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 75 . pep M FV FQT AFXM FQKHLQKAS D S WGGT L Y WAT P I GNLAD I T LRALAVLQKAXXXXAE DTR 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 7 5a MFQKHLQKAS DS WGGT LYWAT P I GNLADITLRALAVLQKADI I CAEDTR 

10 20 30 40 50 



70 80 90 100 110 120 

orf 7 5 . pep VTAQLLSAYG IQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLAR 
II II I III III I Ml IIMMII If ! Ml I MINI II M M Ml II M MMIIIIM I 
orf 75a VTAQLLSAYGIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLAR 

60 70 80 90 100 110 

130 140 150 160 170 180 

orf 75 . pep RVREAGF KWPWGAXAVMAALSVA GVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIV 
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I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I ! I : I II : I 
or f 7 5a RVREVGF KWPWGASAVMAALSVA GVAGSDFYFNGFVPPKSGERRKLFAKWVRVAFPW 
120 130 140 150 160 170 



190 200 210 220 230 240 

orf 75 . pep MFETPHRIGAALADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALSADGDQSRGEM 
I M I I II I I I : I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I 1 I I I I I : I II : I 1 I I I 1 
orf 75a MFETPHRIGATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEM 
180 190 200 210 220 230 

250 260 270 ' 280 290 

orf 75 . pep VLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYD 
IN II I II II Mil I I I Ml I! I I I II II 1 I I I Mil II I I I I I I II I I I I I 
orf 75a VLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNK 
240 250 260 270 280 290 



orf75a X 

The complete length ORF75a nucleotide sequence <SEQ ID 289> is: 

1 ATGTTTCAGA AACATTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCGGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATCTGTGC CGAAGACACG 

151 CGCGTTACCG CGCAGCTTTT GAGCGCGTAC GGCATTCAGG GCAAACTCGT 

201 CAGCGTGCGC GAACACAACG AACGGCAGAT GGCGGACAAG ATTGTCGGCT 

251 ATCTTTCAGA CGGCATGGTT GTGGCACAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GTGAGGTCGG 

351 GTTTAAAGTT GTCCCTGTTG TCGGCGCAAG CGCGGTGATG GCGGCTTTGA 

401 GTGTGGCTGG TGTGGCGGGA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

451 CCGAAATCGG GCGAACGTAG GAAATTGTTT GCCAAATGGG TGCGGGTGGC 

501 GTTTCCCGTC GTGATGTTTG AAACGCCGCA CCGCATCGGG GCGACGCTTG 

551 CCGATATGGC GGAACTGTTC CCCGAACGCC GATTAATGCT GGCGCGCGAA 

601 ATCACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

651 GACGGCATTG GCGGCGGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

701 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCCGCG 

751 CAAAACATCA TGAAAATCCT CACAGCCGAG CTGCCGACCA AACAGGCGGC 

801 GGAGCTTGCC GCCAAAATCA CGGGCGAGGG AAAAAAAGCT TTGTACGATC 

851 TGGCACTGTC TTGGAAAAAC AAATGA 

This encodes a protein having amino acid sequence <SEQ ID 290>: 



1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADIICAEDT 

51 RVTAQLLSAY GIQGKLVSVR EHNERQMADK IVGYLSDGMV VAQVSDAGTP 

101 AVCDPGAKLA RRVREVGFK V VPWGASAVM AALSVA GVAG SDFYFNGFVP 

151 PKSGERRKLF AKWVRVAFPV VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL AADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNIMKILTAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

ORF75a and ORF75-1 show 98.3% identity in 291 aa overlap: 



10 20 30 40 50 60 

orf 75a . pep MFQKHLQKAS DSWGGTLYWATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAY 
1 I I I I I I I I I I I I I I I I I I I I I I I I I M I I I II I I I I I I I I I I I I I I 1 I I I I I I II I I I I 
orf 75-1 MFQKHLQKAS DSWGGTLYVVATPIGNLADITLRALAVLQKADI I CAE DTRVTAQLLSAY 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 75a . pep G I QGKLVS VREHNERQMADKI VGYLS DGMWAQVS DAGT PAVC D PGAKLARRVREVGFKV 
II | I | I I II I I I I I I I I II I I I I I I I I I II II II II I II I I I I I I II I I I I I I I I : I I I I 
orf 75-1 GIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLARRVREAGFKV 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 75a . pep VPWGASAVMAALSVAGVAGSDFYFNGFVPPKSGERRKLFAKWVRVAFPWMFETPHRIG 
Mill Mill II I MM I Mill II II MM Mill MM I III: IIMIIIIIIllll 
orf 75-1 VPWGASAVMAALSVAGVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIVMFETPHRIG 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 75a . pep m ATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQD 
I I M M I I I I 1 I I I I II I I I I I M I I I I I I I M M I I I I h I M I I II I M I I 1 I I I I M 
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orf75-l ATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALSADGNQSRGEMVLVLYPAQD 

190 200 210 220 230 240 



250 260 270 280 290 

or f 7 5a. pep EKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 
I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I 
orf75-l EKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 

250 260 270 280 290 

Homology with a predicted ORF from N. gonorrhoeae 

ORF75 shows 93.2% identity over a 292aa overlap with a predicted ORF (ORF75.ng) from N. 



gonorrhoeae: 

orf75.pep 

orf75ng 

orf75.pep 

orf75ng 

orf75.pep 

orf75ng 

orf75.pep 

orf75ng 

orf 75 ,pep 

orf75ng 



MFVFQTAFXMFQKHLQKASDSWGGTLYVVATPIGNLADITLRALAVLQKA AEDTR 56 

I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

MSVFQTAFFMFQKHLQKASDSWGGTLYWATPIGNLADITLRALAVLQKADIICAEDTR 60 

VTAQLLSAYGIQGKLVSVREHNERQMADKIVGYLSDGMVVAQVSDAGTPAVCDPGAKLAR 116 
I | | | | I I | I I I I I : I I I I I I I I I I I I I I I:: I : I I I I : I I I I I I I I I I I I I I I I I I I i I I 

VTAQLLSAYGIQGRLVSVREHNERQMADKVIGFLSDGLWAQVSDAGTPAVCDPGAKLAR 120 

RVRE AG FKW P WGAXAVMAAL S V AG VEG S D FY FNG FV P PK SGE RRKL FAKWVRAAFP I V 176 

I 1 I I I I I || I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I : I 

RVREAGFKWPWGASAVMAALSVAGVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPW 180 

MFETPHRIGAALADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALSADGDQSRGEM 236 
Ml IE 11111:11 I I II MM! II I I I! I! I I I III II I M MM I II I: II II I 

MFETPHRIGATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEM 240 

VLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYD 288 

I I I I I I I II I I I I I I I I I I I I I I I I : II I I I I I II I I I I I I I I I I I I I I I I 

VLVLYPAQDEKHEGLSESAQNAMKILAAELPTKQAAELAAKITGEGKKALYDLALSWKNK 300 



An ORF75ng nucleotide sequence <SEQ ID 291 > was predicted to encode a protein having amino 



acid sequence <SEQ ID 292>: 



1 MSVFQTAFFM FQKHLQKASD SWGGTLYW ATPIGNLADI TLRALAVLQK 

51 ADIICAEDTR VTAQLLSAYG IQGRLVSVRE HNERQMADKV IGFLSDGLW 

101 AQVSDAGTPA VCDPGAKLAR RVREAGF KVV PVVGASAVMA ALSVA GVAES 

151 DFYFNGFVPP KSGERRKLFA KWVRAAFPVV MFETPHRIGA TLADMAELFP 

201 ERRLMLAREI TKTFETFLSG TVGEIQTALA ADGNQSRGEM VLVLYPAQDE 

251 KHEGLSESAQ NAMKILAAEL PTKQAAELAA KITGEGKKAL YDLALSWKNK 

301 * 

After further analysis, the following gonococcal DNA sequence <SEQ ED 293> was identified: 



1 ATGTTTCAGA AACACTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCAGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATTTGTGC CGAAGACACG 

151 CGCGTTACTG CGCAGCTTTT GAGCGCGTAC GGCATTCAGG GCAGGTTGGT 

201 CAGTGTGCGC GAACACAACG AGCGGCAGAT GGCGGACAAG GTAATCGGTT 

251 TCCTTTCAGA CGGCCTGGTT GTGGCGCAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GCGAAGCAGG 

351 GTTCAAAGTC GTTCCCGTCG TGGGCGCAAG CGCGGTAATG GCGGCGTTGA 

401 GTGTGGCCGG TGTGGCGGAA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

451 CCGAAATCGG GCGAACGTAG GAAATTGTTT GCCAAATGGG TGCGGGCGGC 

501 ATTTCCTGTC GTCATGTTTG AAACGCCGCA CCGAATCGGG GCAACGCTTG 

551 CCGATATGGC GGAATTGTTC CCCGAACGCC GTCTGATGCT GGCGCGCGAA 

601 ATCACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

651 GACGGCATTG GCGGCGGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

701 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCTGCG 

751 CAAAATGCGA TGAAAATCCT TGCGGCCGAG CTGCCGACCA AGCAGGCGGC 

801 GGAGCTTGCC GCCAAGATTA CAGGTGAGGG CAAAAAGGCT TTGTACGATT 

851 TGGCACTGTC GTGGAAAAAC AAATGA 



This corresponds to the amino acid sequence <SEQ ID 294; ORF75ng-l>: 
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1 
51 
101 
151 
201 
251 



MFQKHLQKAS 
RVTAQLLSAY 
AVCDPGAKLA 
PKSGERRKLF 
ITKTFETFLS 
QNAMKILAAE 



DSWGGTLYV 
GIQGRLVSVR 
RRVREAGFKV_ 
AKWVRAAFPV 
GTVGEIQTAL 
LPTKQAAELA 



VATPIGNLAD 
EHNERQMADK 
VPWGASAVM 



ITLRALAVLQ 
VIGFLSDGLV 
AALSVAGVAE 



VMFETPHRIG 
AADGNQSRGE 
AKITGEGKKA 



ATLADMAELF 
MVLVLYPAQD 
LYDLALSWKN 



KADI I CAE DT 
VAQVSDAGTP 
SDFYFNGFVP 
PERRLMLARE 
EKHEGLSESA 
K* 



ORF75ng-l and ORF75-1 show 96.2% identity in 291 aa overlap: 
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10 20 30 40 50 60 

orf 75-1 . pep MFQKHLQKAS DSVVGGTLYWATPIGNLAD I TLRAIAVLQKADIICAEDTRVTAQLLSAY 
MM Mill III II It Mill III II II III Mill MM I Mill II Mill llilll I 
orf75ng-l MFQKHLQKAS DSVVGGTLYWATPIGNLAD I TLRALAVLQKADIICAEDTRVTAQLLSAY 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 75-1 . pep G IQGKLVS VREHNERQMADKI VGYLS DGMWAQVS DAGT PAVCDPGAKLARRVREAG FKV 

I I I I : I I I I I I 1 I I ! I 11 I I :: I : I I I I : I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I 
orf75ng-l GIQGRLVSVREHNERQMADKVIGFLSDGLWAQVSDAGTPAVCDPGAKLARRVREAGFKV 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 75-1. pep VPVVGASAVMAALSVAGVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFP I VMFETPHRIG 
II | II II I I I I I II I I I I I II I II I I I i I I I II II 1 I I I II II I II I : I I II II I I I I 
orf75ng-l VPWGASAVMAALSVAGVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPWMFETPHRIG 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 75-1 . pep ATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALSADGNQSRGEMVLVLYPAQD 

II I || I II II I I M II ! II II II II II I I I I I II II II II : I II II II I I II II II II I I 
orf75ng-l ATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQD 

190 200 210 220 230 240 

250 260 270 280 290 

orf 75-1 . pep EKHEG L S E S AQN I MK I LT AE L PTKQAAE LAAK I TGEGKKAL Y D LAL S WKNKX 
II I I I II II II 1 II II : I I I I I I II II II I I II I II I II I II II I II II II 
orf75ng-l EKHEGLSESAQNAMKILAAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 

250 260 270 280 290 

Furthermore, ORG75ng-l shows significant homology to a hypothetical E.coli protein: 

sp|P45528|YRAL_ECOLI HYPOTHETICAL 31.3 KD PROTEIN IN AGAI-MTR INTERGENIC REGION 
(F286) 

>gi 1 606086 (U18997) ORF_f286 [Escherichia coli) 

>gi 1 1789535 (AE000395) hypothetical 31.3 kD protein in agai-mtr intergenic 
region [Escherichia coli] Length =286 
Score = 218 bits (550), Expect - 3e-56 

Identities = 128/284 (45%) , Positives - 171/284 (60%), Gaps = 4/284 (1%) 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



64 



60 



KHLQKASDSWGGTLYWATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAYGIQ 63 
K Q A +S G LY+V TPIGNLADIT RAL VLQ D+I AEDTR T LL +GI 
KQHQSADNSQ — GQLYIVPTPIGNLADITQRALEVLQAVDLIAAEDTRHTGLLLQHFGIN 59 

GRLVS VREHNERQMADKV I G FL S DG L W AQVS DAGT P AVC D PG AKL ARRVRE AG FKW PV 123 

RL ++ +HNE+Q A+ ++ L +G +A VSDAGTP + DPG L R RE AG +WP+ 
ARLFALHDHNEQQKAETLLAKLQEGQNIALVSDAGTPLINDPGYHLVRTCREAGIRWPL 119 



124 VGASAVMAALSVAGVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPWMFETPHRIGATL 183 

G A + ALS AG+ F + GF+P KS RR ++ +E+ HR+ +L 

120 PGPCAAITALSAAGLPSDRFCYEGFLPAKSKGRRDALKAIEAEPRTLIFYESTHRLLDSL 17 9 

184 ADMAELFPERR-LMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQDEK 242 

D+ + E R ++LARE+TKT+ET VGE+ + D N+ +GEMVL++ + 

180 EDIVAVLGESRYWLARELTKTWETIHGAPVGELLAWVKEDENRRKGEMVLIV-EGHKAQ 238 

243 HEGLSESAQNAMKILAAELPTKQAAELAAKITGEGKKALYDLAL 286 

EL A + +L AELP K+AA LAA+I G K ALY AL 
239 EE DL P ADALRT LALLQAE L P LKKAAALAAE I HGVKKN ALYK YAL 282 
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Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 35 

The following partial DNA sequence was identified in hi r . meningitidis <SEQ ID 295>: 

1 ATGAAACAGA AAAAAACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 
51 TTTTGCGGCA GC.AAAGCAC CCGAAATCGA CCCGGCTTTG 

// 

651 GAGTTGG TCAGAAACCA GTTGGAGCAG GGTTTGAGAC 

701 AGGAAAAAGC CCGCTTGAAA ATCGATGCCC TTTTGGAAGA AAACGGTGTC 
751 AAACCGTAA 

This corresponds to the amino acid sequence <SEQ ID 296; ORF76>: 

1 MKQKKTAAAV IAAMLAGFAA XKAPEIDPAL 

// 

201 ELVRNQLEQG LRQEKARLKI DALLEENGVK 

251 P* 

Further work revealed the complete nucleotide sequence <SEQ ID 297>: 

1 ATGAAACAGA AAAAAACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 

51 TTTTGCGGCA GCCAAAGCAC CCGAAATCGA CCCGGCTTTG GTGGATACGC 

101 TGGTGGCGCA GATCATGCAG CAGGCAGACC GGCATGCGGA GCAGTCCCAA 

151 AAACCGGACG GGCAGGCAAT CCGAAACGAT GCCGTCCGCC GGCTACAAAC 

201 TTTGGAAGTT TTGAAAAACA GGGCATTGAA GGAAGGTTTG GATAAGGATA 

251 AGGATGTCCA AAACCGCTTT AAAATCGCCG AAGCGTCTTT TTATGCCGAG 

301 GAGTACGTCC GTTTTCTGGA ACGTTCGGAA ACGGTTTCCG AAGACGAGCT 

351 GCACAAGTTT TACGAACAGC AAATCCGCAT GATCAAATTG CAGCAGGTCA 

401 GCTTCGCAAC CGAAGAGGAG GCGCGTCAGG CGCAGCAGCT CCTGCTCAAA 

451 GGGCTGTCTT TTGAAGGGCT GATGAAGCGT TATCCGAACG ACGAGCAGGC 

501 TTTTGACGGT TTCATTATGG CGCAGCAGCT TCCCGAGCCG CTGGCTTCGC 

551 AGTTTGCCGC GATGAATCGG GGCGACGTTA CCCGCGATCC GGTCAAATTG 

601 GGCGAACGCT ATTATCTGTT CAAACTCAGC GAGGTCGGGA AAAACCCCGA 

651 CGCGCAGCCT TTCGAGTTGG TCAGAAACCA GTTGGAGCAG GGTTTGAGAC 

701 AGGAAAAAGC CCGCTTGAAA ATCGATGCCC TTTTGGAAGA AAACGGTGTC 

751 AAACCGTAA 

This corresponds to the amino acid sequence <SEQ ID 298; ORF76-l>: 

1 MKQKKTAAAV IAAMLAGFAA AKA PEIDPAL VDTLVAQIMQ QADRHAEQSQ 
51 KPDGQAIRND AVRRLQTLEV LKNRALKEGL DKDKDVQNRF KIAEASFYAE 
101 EYVRFLERSE TVSEDELHKF YEQQIRMIKL QQVSFATEEE ARQAQQLLLK 
151 GLSFEGLMKR YPNDEQAFDG FIMAQQLPEP LASQFAAMNR GDVTRDPVKL 
201 GERYYLFKLS EVGKNPDAQP FELVRNQLEQ GLRQEKARLK IDALLEENGV 
251 KP* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF76 shows 96.7% identity over a 30aa overlap and 96.8% identity over a 31aa overlap with an 
ORF (ORF76a) from strain A of N. meningitidis: 

10 20 30 

orf 7 6 . pep MKQKKTAAAV I AAM LAG FAAXKA PE ID PAL 
I I I I I I I I I I I I I i I I I I I I I I I I I I I I I 
orf 7 6a MKQKKTAAAVIAAMLAGFAAAKA PEIDPALVDTLVAQIMQQADRHAEQSQKPDGQAIRND 

10 20 30 40 50 60 

// 

70 80 90 
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orf 7 6 . pep XELVRNQLEQGLRQEKARLK I DALLEENGVKPX 

I I I I I I I I I II I I I I I I I I I I I : I I I I I I I I I 
orf 7 6a DVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLKIDAILEENGVKPX 
200 210 220 230 240 250 

The complete length ORF76a nucleotide sequence <SEQ ID 299> is: 



1 ATGAAACAGA AAAAAACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGGAGG 

51 TTTTGCGGCA GCCAAAGCAC CCGAAATCGA CCCGGCTTTG GTGGATACGC 

101 TGGTGGCGCA GATCATGCAG CAGGCAGACC GGCATGCGGA GCAGTCCCAA 

151 AAACCGGACG GGCAGGCAAT CCGAAACGAT GCCGTCCGTC GGCTGCAAAC 

201 TTTGGAAGTT TTGAAAAACA GGGCATTGAA GGAAGGTTTG GATAAGGATA 

251 AGGATGTCCA AAACCGCTTT AAAATCGCCG AAGCGTCTTT TTATGCCGAG 

301 GAGTACGTCC GTTTTCTGGA ACGTTCGGAA ACGGTTTCCG AAAGCGCACT 

351 GCGTCAGTTT TATGAGCGGC AAATCCGCAT GATCAAATTG CAGCAGGTCA 

401 GCTTCGCAAC CGAAGAGGAG GCGCGTCAGG CGCAGCAGCT CCTGCTCAAA 

451 GGGCTGTCTT TTGAAGGGCT GATGAAGCGT TATCCGAACG ACGAGCAGGC 

501 TTTTGACGGT TTCATTATGG CGCAGCAGCT TCCCGAGCCG CTGGCTTCGC 

551 AGTTTGCAGC GATGAATCGG GGCGACGTTA CCCGCGATCC GGTCAAATTG 

601 GGCGAACGCT ATTATCTGTT CAAACTCAGC GAGGTCGGGA AAAACCCCGA 

651 CGCGCAGCCT TTCGAGTTGG TCAGAAACCA GTTGGAACAA GGTTTGAGAC 

701 AGGAAAAAGC CCGCTTGAAA ATCGATGCCA TTTTGGAAGA AAACGGTGTC 

751 AAACCGTAA 

This encodes a protein having amino acid sequence <SEQ ID 300>: 

1 MKQKKTAAAV IAAMLAGFAA AKA PEIDPAL VDTLVAQIMQ QADRHAEQSQ 

51 KPDGQAIRND AVRRLQTLEV LKNRALKEGL DKDKDVQNRF KIAEASFYAE 

101 EYVRFLERSE TVSESALRQF YERQIRMIKL QQVSFATEEE ARQAQQLLLK 

151 GLSFEGLMKR YPNDEQAFDG FIMAQQLPEP LASQFAAMNR GDVTRDPVKL 

201 GERYYLFKLS EVGKNPDAQP FELVRNQLEQ GLRQEKARLK IDAILEENGV 

251 KP* 

ORF76a and ORF76-1 show 97.6% identity in 252 aa overlap: 



10 20 30 40 50 60 

orf 7 6a . pep MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHAEQSQKPDGQAIRND 
I I I I I I I I I I I I I M I I I I I I II I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I 
orf 7 6-1 MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHAEQSQKPDGQAIRND 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 7 6a . pep AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEASFYAEEYVRFLERSETVSESALRQF 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I : I : : I 
orf 76-1 AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEASFYAEEYVRFLERSETVSEDELHKF 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 76a . pep YERQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPNDEQAFDGFIMAQQLPEP 
I I: I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 76-1 YEQQIRM I KLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPNDEQAFDG FIMAQQLPEP 

130 140 150 160 170 180 



190 200 210 220 230 240 

or f 7 6a . pep LASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLK 
I I I I I I I I I I I I I I I I 1 1 1 1 I I I I I I I I I I I I I II I I I I I I I I I I 1 1 I I I I I I ! 1 I I I 1 1 
orf 76-1 LASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLK 

190 200 210 220 230 240 



250 

orf 7 6a . pep I DAI LEENGVKPX 
111:111111111 
orf 76-1 I DALLEENGVKPX 

250 



Homology with a predicted ORF from ^gonorrhoeae 

The aligned aa sequences of ORF76 and a predicted ORF (ORF76.ng) from N. gonorrhoeae of the 
N- and C-termini show 96.7 % and 100% identity in 30 and 31 overlap, respectively: 
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orf7 6.pep MKQKKTAAAVI AAM LAG FAAXKAPE I DPAL 30 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
or f 7 6ng MKQKKTAAAVI AAMLAGFAAAKAPE I DPALVDTLVAQ I MQQADRHAEQSQRPDGQAIRND 60 

// 

orf 7 6 . pep ELVRNQLEQGLRQEKARLKI DALLEENGVKP 251 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf76ng VTRNPVKLGERYYLFKLGAVGKNPDAQPFELVRNQLEQGLRQEKARLK I DALLEENGVKP 251 



The complete length ORF76ng nucleotide sequence <SEQ ID 301 > is: 



10 



15 



20 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 



ATGAAACAGA 
TTTTGCGGCA 
TGGTGGCGCA 
AGACCGGACG 
TTTGGAAGTT 
AGGATGTCCA 
GAGTACGTCC 
GCGTCAGTTT 
GCTTCGCAAC 
GGGCTGTCTT 
GTTCGACGGT 
agtttgCCGG 
GGCGAACGCT 
CGCGCAGCCT 
AGGAAAAAGC 
AaacCGTAA 



AAAAGACCGC 
GCCAAAGCAC 
GATCATGCAG 
GGCAGGCAAT 
TTGAAAAACA 
AAACCGCTTT 
GTTTTCTGGA 
TATGAGCGGC 
CGAAGAGGAG 
TTGAAGGGCT 
TTCATTATGG 
TATGAACCGT 
ATTACCTGTT 
TTCGAGTTGG 
CCGCTTGAAA 



TGCCGCAGTT 
CCGAAATCGA 
CAGGCAGACC 
CCGAAACGAT 
GGGCATTGAA 
AAAATCGCCG 
ACGTTCGGAA 
AAATCCGCAT 
GCGCGTCAGG 
GATGAAGCGT 
CGCAGCAGCT 
GGCGACGTTA 
CAAACTCGGC 
TCAGAAACCA 
ATCGATGCCC 



ATTGCTGCAA 
CCCGGCTTTG 
GGCATGCGGA 
GCCGTCCGCC 
GGAAGGTTTG 
AAGCGTCTTT 
ACGGTTTCCG 
GATCAAATTG 
CGCAGCAGCT 
TATCCGAACG 
TCCCGAGCCG 
CCCGCAATCC 
GCGGTCGGGA 
GTTGGAACAA 
TTTTGGAaga 



TGTTGGCAGG 
GTGGATACGC 
GCAGTCCCAA 
GGCTGCAAAC 
GATAAGGATA 
TTATGCCGAG 
AAAGCGCACT 
CAGCAGGTCA 
CCTGCTCAAA 
ACGAGCAGGC 
CTGGCTTcgc 
GGTCAAATTG 
AAAACCCCGA 
GGTTTGAGGC 
Aaacggtgtc 



25 This encodes a protein having amino acid sequence <SEQ ID 302>: 



30 



1 MKOKKTAAAV IAAMLAGFAA AKA PEIDPAL VDTLVAQIMQ QADRHAEQSQ 

51 RPDGQAIRND AVRRLQTLEV LKNRALKEGL DKDKDVQNRF KIAEASFYAE 

101 EYVRFLERSE TVSESALRQF YERQIRMIKL QQVS FATE EE ARQAQQLLLK 

151 GLSFEGLMKR YPNDEQAFDG FIMAQQLPEP LASQFAGMNR GDVTRNPVKL 

201 GERYYLFKLG AVGKNPDAQP FELVRNQLEQ GLRQEKARLK IDALLEENGV 

251 KP* 

ORF76ng and ORF76-1 show 96.0% identity in 252 aa overlap 



35 



40 



45 



50 



55 



60 



10 20 30 40 50 60 

orf 7 6-1 . pep MKQKKTAAAVI AAMLAGFAAAKAPE I DPALVDTLVAQIMQQADRHAEQSQKPDGQAIRND 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I 
orf7 6ng MKQKKTAAAVI AAMLAGFAAAKAPE I DPALVDTLVAQIMQQADRHAEQSQRPDGQAIRND 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 7 6-1 . pep AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEASFYAEEYVRFLERSETVSEDELHKF 

I I I I I M I II I I I M I M I I II I M I I I I I I II I I I I I I M I I I I i I I II I I I I : |::| 
Orf76ng AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEASFYAEEYVRFLERSETVSESALRQF 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 76-1 . pep YEQQIRMIKLQQVS FATEEEARQAQQLLLKGLS FEGLMKRYPNDEQAFDGFIMAQQLPE P 
11:1 II IN II IN II II Mllll I I Ml II lllll MM! I I Ml II I I I M II Mill 
orf76ng YERQIRMIKLQQVS FATEEEARQAQQLLLKGLS FEGLMKRYPNDEQAFDG FIMAQQLPEP 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 7 6-1 . pep LASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLK 

II I I I I : I I I I I I I I : I I I I M i I I I I I I : II II II I M M I M M M I I I I II I I I I I 
orf76ng LASQFAGMNRGDVTRNPVKLGERYYLFKLGAVGKNPDAQPFELVRNQLEQGLRQEKARLK 

190 200 210 220 230 240 

250 

orf 7 6-1 . pep I DALLE ENGVKPX 
I I I I I I I II II I I 
orf76ng I DALLEENGVKPX 

250 

Furthermore, ORF76ng shows significant homology to a B.subtilis export protein precursor: 
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sp|P24327|PRSA_BACSU PROTEIN EXPORT PROTEIN PRSA PRECURSOR >gi | 98227 | pir | I S15269 
33K lipoprotein - Bacillus subtilis >gi 139782 (X57271) 33JcDa lipoprotein 
[Bacillus subtilis] 

>gi|2226124|gnl|PID|e325181 (Y14077) 33kDa lipoprotein [Bacillus subtilis] 
>gi|2633331|gnllPID|e!182997 (Z99109) molecular chaperonin [Bacillus subtilis] 
Length =292 
Score « 50.4 bits (118), Expect ■ le-05 

Identities = 48/199 (24%), Positives - 82/199 (41%), Gaps = 32/199 (16%) 

Query: 70 VLKNRALKEGLDK DKDVQNRFKI AE AS F YAEE YVRFLERSETVS E 114 

VL ++ LDK DK++ N+ K + Y ++Y++ + E +++ 

Sbjct: 53 VLTQLVQEKVLDKKYKVSDKEIDNKLKEYKTQLGDQYTALEKQYGKDYLKEQVKYELLTQ 112 

Query: 115 SA LRQFYERQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPN 163 

A +++++E 1+ + A ++ A + ++ L KG FE L K Y 

Sbjct: 113 KAAKDNIKVTDADIKEYWEGLKGKIRASHILVADKKTAEEVEKKLKKGEKFEDLAKEYST 172 

Query: 164 DEQAFDG FIMAQQLPEPLASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDA 218 

DAG F Q+E+ + G+V+ DPVK Y++ K +E D 

Sbjct: 173 DSSASKGGDLGWFAKEGQMDETFSKAAFKLKTGEVS-DPVKTQYGYHIIKKTEERGKYDD 231 

Query: 219 QPFELVRNQLEQGLRQEKA 237 

EL LEQ L A 
Sbjct: 232 MKKELKSEVLEQKLNDNAA 250 

Based on this analysis, including the presence of a putative leader sequence and a RGD motif in 
the gonococcal protein, it was predicted that the proteins from N.meningitidis and A^. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF76-1 (27.8kDa) was cloned in the pET vector and expressed in E.coli, as described above. The 
products of protein expression and purification were analyzed by SDS-PAGE. Figure 10A shows 
the results of affinity purification of the His-fusion protein, Purified His-fusion protein was used 
to immunise mice, whose sera were used for Western blot (Figure 10B), ELISA (positive result), 
and FACS analysis (Figure 10C). These experiments confirm that ORF76-1 is a surface-exposed 
protein, and that it is a useful immunogen. 

Example 36 

The following partial DNA sequence was identified in N.meningitidis <SEQ ED 303>: 

1 ATGAAAAAAT CTTTCCTTAC GCTTGTTCTG TATTCGTCTT TACTTACCGC 

51 CAGCGAAATT GCCJTACCCC TTGGAATTGG GGATTGAAAC CTTACCGGCG 

101 GCAAAAATTG CGGAAACGTT TGCGCTGACA TTTGTGATTG CTGCGCTGTA 

151 TCTGTTTGCG CGTAATAAGG TGACGCGTTT GTTGATTGCG GTGTTTTTTG 

201 CGTTCAGCAT TATTGCCAAC AATGTGCATT ACGCGGATTA TCAAAGCTGG 

251 ATGACG 

// 

1201 CAAACCGTAT TCGAGCAGCT GCAAAAGACT CCTGACGGCA 

1251 ACTGGCTGTT TGCCTATACC TCCGATCATG GCCAGTATGT TCGCCAAGAT 

1301 ATCTACAATC AAGGCACGGT GCAGCCCGAC AGCTATCTCG TGCCGCTAGT 

1351 GTTGTACAGC CCGGATAAGG CCGTGCAACA GGCTGCCAAC CAGGCTTTTG 

1401 CGCCTTGCGA GATTGCCTTC CATCAGCAGC TTTCAACGTT CCTGATTCAC 

1451 ACGTTGGGCT ACGATATGCC GGTTTCAGGT TGTCGCGAAG GCTCGGTAAC 

1501 GGGCAACCTG ATTACGGGTG ATGCAGGCAG CTTGAACATT CGCGACGGCA 

1551 AGGCGGAATA TGTTTATCCG CAATGA 

This corresponds to the amino acid sequence <SEQ ID 304; ORF81>: 



1 MKKSFLTLVL YSSLLTASEI AYPLELGIET LPAAKIAETF ALTFVIAALY 
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51 LFARNKVTRL LIAVFFAFSI IANNVHYADY QSWMT 

// 

401 ...QTVFEQL QKTPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYLVPLV 
451 LYSPDKAVQQ AANQAFAPCE IAFHQQLSTF LIHTLGYDMP VSGCREGSVT 
501 GNLITGDAGS LNIRDGKAEY VYPQ* 

Further work revealed the complete nucleotide sequence <SEQ ID 305>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 



ATGAAAAAAT 
CAGCGAAATT 
CAAAAATTGC 
CTGTTTGCGC 
GTTCAGCATT 
TGACGGGCAT 
AGCGCGGGTG 
CGTGTTGGAA 
CGCATTTTTC 
GTGCGTTCGT 
ATACAGCCGC 
GCGTGTTGCC 
CAGCCTGCTC 
GATTATGGGC 
GACGCGAAAC 
CCGATTGTGA 
GCCCAGTTTT 
GCGGCGGCGA 
ACGTATTTTT 
AATCGGTAAG 
ACGGCAACGG 
AAAATCAATT 
TTCGCACGCC 
GCGAAGCCGA 
CAAATGATTC 
CTGGCTGTTT 
TCTACAATCA 
TTGTACAGCC 
GCCTTGCGAG 
CGTTGGGCTA 
GGCAACCTGA 
GGCGGAATAT 



CTTTCCTTAC 
GCCTATCGCT 
GGAAACGTTT 
GTTATAAGGT 
ATTGCCAACA 
CAATTATTGG 
CGTCGATGTT 
GTCATGTTGT 
TGCCGATATA 
TCGACACGAA 
ATCAAAGCCA 
GTATCAGTTG 
CAAGCAAAAT 
GAAAGCGAAA 
TTCGCCGTTT 
AACAAAGTTA 
TTCAATGCGA 
TACCAATATG 
ACAGCGCGCA 
AAATGGATAG 
CGACAATATG 
TGCAGCAGGG 
CCATACGGCG 
TATTGTGGAT 
AAACCGTATT 
GCCTATACCT 
AGGCACGGTG 
CGGATAAGGC 
ATTGCCTTCC 
CGATATGCCG 
TTACGGGTGA 
GTTTATCCGC 



GCTTGTTCTG 
TTGTATTTGG 
GCGCTGACAT 
GACGCGTTTG 
ATGTGCATTA 
CTGATGCTGA 
GGATAAGTTG 
TTTGCAGCCT 
CTGTTTGCCT 
ACAAGAGCAC 
ATTATTTCAG 
TTTGATTTAA 
CGGGCAGGGC 
GCGCGGCGCA 
TTAACCCGGC 
TTCCGCAGGC 
TACCGCACGC 
TTCCGCCTCG 
GGCGGAAAAC 
ACCATCTGAT 
CCCGATGAGA 
CAAGCATTTT 
CATTGTTGCA 
AAGTACGACA 
CGAGCAGCTG 
CCGATCATGG 
CAGCCCGACA 
CGTGCAACAG 
ATCAGCAGCT 
GTTTCAGGTT 
TGCAGGCAGC 
AATGA 



TATTCGTCTT 
GATTGAAACC 
TTGTGATTGC 
TTGATTGCGG 
CGCGGTTTAT 
AAGAGGTTAC 
TGGCTGCCTG 
TGCCAAGTTC 
TCCTAATGCT 
GGTATTTCGC 
CTTCGGTTAT 
GCAGGATTCC 
AGTGTTCAAA 
TTTGAAGCTG 
TGTCGCAAGC 
TTTATGACTG 
CAACGGCTTG 
CCAAAGAGCA 
GAGATGGCGA 
TCAGCCGACG 
AGCTGCTGCC 
ATCGTGTTGC 
GCCTCAAGAT 
ACACCATCCA 
CAAAAGCAGC 
CCAGTATGTT 
GCTATCTCGT 
GCTGCCAACC 
TTCAACGTTC 
GTCGCGAAGG 
TTGAACATTC 



TACTTACCGC 
TTACCGGCGG 
TGCGCTGTAT 
TGTTTTTTGC 
CAAAGCTGGA 
CGAAGTCGGC 
TGTTGTGGGG 
CGCCGTAAGA 
GATGATTTTC 
CCAAACCGAC 
TTTGTCGGAC 
CGCCTTTAAG 
ATATCGTCCT 
TTTGGCTACG 
CGATTTTAAG 
CAGTGTCCCT 
GAACAAATCA 
GGGCTATGAA 
TTTTGAACTT 
CAACTTGGCT 
GTTGTTCGAC 
ACCAACGCGG 
AAAGTATTCG 
CAAAACCGAC 
CTGACGGCAA 
CGCCAAGATA 
GCCGCTAGTG 
AGGCTTTTGC 
CTGATTCACA 
CTCGGTAACG 
GCGACGGCAA 



This corresponds to the amino acid sequence <SEQ ID 306; ORF81-l>: 



1 MKKSFLTLVL YSSLLTASEI AYRFVFGIET LPAAKIAETF ALTFVIAALY 

51 LFARYKVTRL LIAVFFAFSI IANNVHYAVY QSWMTGINYW LMLKEVTEVG 

101 SAGASMLDKL WLPVLWGVLE VMLFCSLAKF RRKTHFSADI LFAFLMLMIF 

151 VRSFDTKQEH GISPKPTYSR IKANYFSFGY FVGRVLPYQL FDLSRIPAFK 

201 QPAPSKIGQG SVQNIVLIMG ESESAAHLKL FGYGRETSPF LTRLSQADFK 

251 PIVKQSYSAG FMTAVSLPSF FNAIPHANGL EQISGGDTNM FRLAKEQGYE 

301 TYFYSAQAEN EMAILNLIGK KWIDHLIQPT QLGYGNGDNM PDEKLLPLFD 

351 KINLQQGKHF IVLHQRGSHA PYGALLQPQD KVFGEADIVD KYDNTIHKTD 

401 QM I QTVFEQL QKQPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYLVPLV 

451 LYSPDKAVQQ AANQAFAPCE IAFHQQLSTF LIHTLGYDMP VSGCREGSVT 

501 GNLITGDAGS LNIRDGKAEY VYPQ* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF81 shows 84.7% identity over a 85aa overlap and 99.2% identity over a 121aa overlap with 
an ORF (ORF81a) from strain A of N. meningitidis: 



10 20 30 40 50 60 

orf 81 .pep MKKSFLTLVLYSSLLTAS EIAYPLELGIETLPAAK IAETFALTFVIAALYLFA RNKVTRL 
||||:::| I I I I I I I I I I I I I : : I I II I I I I I : I I 1 I I I I I I I i I i i I I I I I : I ll 
orf 8 la MKKSLFVLFLYSSLLTAS EIAYRFVFGIETLPAAK MAETFALTFVIAALYLF ARYKATRL 

10 20 30 40 50 60 

70 80 
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orf 81 . pep LIAVFFAFSIIANNVH YADYQSWMT 

I I I I I I I I I I I I I I I I I I I I I I : I 
orf 81a LIAVFFAFSIIANNVH YAVYQSWITGINYWLMLKEITEVGGAGASMLDKLW LPALWGVLE 

70 80 90 100 110 120 

// 

120 130 140 

orf 81 . pep QT V FEQLQKT P DGN W L FAYT S DHGQYVRQ D 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 81a IPHANGLEQISGGDIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLFAYTSDHGQYVRQD 
280 290 300 310 320 330 



150 160 170 180 190 200 

orf 81 . pep IYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 
I I I I I I I I I I I I I II I I I I I I I I I I I I I I I i I I I I 11 I I I I I I I I I 1 I I I I I I ! I I I I I I 
orf 81a IYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 
340 350 360 370 380 390 



210 220 230 

orf 8 1 . pep CREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 
I I I I I I I I I I I I I I I I I I I I I I I t I I I I I I I I 
orf 8 la CREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 
400 410 420 

The complete length ORF8 la nucleotide sequence <SEQ ID 307> is: 



1 ATGAAAAAAT CCCTTTTCGT TCTCTTTCTG TATTCGTCCC TACTTACTGC 

51 CAGCGAAATT GCTTATCGCT TTGTATTCGG AATTGAAACC TTACCGGCTG 

101 CAAAAATGGC AGAAACGTTT GCGCTGACAT TTGTGATTGC TGCGCTGTAT 

151 CTGTTTGCGC GTTATAAGGC AACGCGTTTG TTGATTGCGG TGTTTTTCGC 

201 GTTCAGCATT ATTGCCAACA ATGTGCATTA CGCGGTTTAT CAAAGCTGGA 

251 TAACGGGCAT TAATTATTGG CTGATGCTGA AAGAGATTAC CGAAGTTGGC 

301 GGCGCAGGGG CGTCGATGTT GGATAAGTTG TGGCTGCCTG CGTTGTGGGG 

351 CGTGTTGGAA GTCATGTTGT TTTGCAGCCT TGCCAAGTTC CGCCGTAAGA 

401 CGCATTTTTC TGCCGATATA CTGTTTGCCT TCCTAATGCT GATGATTTTC 

451 GTGCGTTCGT TCGACACGAA ACAAGAACAC GGTATTTCGC CCAAACCGAC 

501 ATACAGCCGC ATCAAAGCCA ATTATTTCAG CTTCGGTTAT TTTGTCGGAC 

551 GCGTGTTGCC GTATCAGTTG TTTGATTTAA GCAAGATTCC TGTGTTCAAA 

601 CAGCCTGCTC CAAGCAGAAT CGGGCAAGGC AGTATTCAAA ATATCGTCCT 

651 GATTATGGGC GAAAGCGAAA GCGCGGCGCA TTTGAAATTG TTTGGCTACG 

701 GGCGCGAAAC TTCGCCGTTT TTGACCCAGC TTTCGCAAGC CGATTTTAAG 

751 CCGATTGTGA AACAAAGTTA TTCCGCAGGC TTTATGACGG CAGTATCCCT 

801 GCCCAGTTTC TTTAACGTCA TACCGCATGC CAACGGCTTG GAACAAATCA 

851 GCGGCGGCGA TATTGTGGAT AAGTACGACA ACACCATCCA CAAAACCGAC 

901 CAAATGATTC AAACCGTATT CGAGCAGCTG CAAAAGCAGC CTGACGGCAA 

951 CTGGCTGTTT GCCTATACCT CCGATCATGG CCAGTATGTT CGCCAAGATA 

1001 TCTACAATCA AGGCACGGTG CAGCCCGACA GCTATCTCGT GCCGCTGGTG 

1051 TTGTACAGCC CGGATAAGGC CGTGCAACAG GCTGCCAACC AGGCTTTTGC 

1101 GCCTTGCGAG ATTGCCTTCC ATCAGCAGCT TTCAACGTTC CTGATTCACA 

1151 CGTTGGGCTA CGATATGCCG GTTTCAGGTT GTCGCGAAGG CTCGGTAACG 

1201 GGCAACCTGA TTACGGGTGA TGCAGGCAGC TTGAACATTC GCGACGGCAA 

1251 GGCGGAATAT GTTTATCCGC AATGA 

This encodes a protein having amino acid sequence <SEQ ID 308>: 



1 MKKSLFVLFL YSSLLTAS EI AYRFVFGIET LPAAK MAETF ALTFVIAALY 

51 LFARYKATR L LIAVFFAFSI IANNVH YAVY QSWITGINYW LMLKEITEVG 

101 GAGASMLDKL W LPALWGVLE VMLFCSLA KF RRKT HFSADI LFAFLMLMIF 

151 VRSFDTKQEH GISPKPTYSR IKANYFSFGY FVGRVLPYQL FDLSKIPVFK 

201 QPAPSRIGQG SIQNIVLIMG ESESAAHLKL FGYGRETSPF LTQLSQADFK 

251 PIVKQSYSAG FMTAVSLPSF FNVIPHANGL EQISGGDIVD KYDNTIHKTD 

301 QMIQTVFEQL QKQPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYLVPLV 

351 LYSPDKAVQQ AANQAFAPCE IAFHQQLSTF LIHTLGYDMP VSGCREGSVT 

401 GNLITGDAGS LNIRDGKAEY VYPQ* 

ORF81a and ORF81-1 show 77.9% identity in 524 aa overlap: 



10 20 30 40 50 60 

orf 81a . pep MKKSLFVLFLYSSLLTASEIAYRFVFGIETLPAAKMAETFALTFVIAALYLFARYKATRL 

I I I I : : : I I I I I I I I I I I I I I I I I I I ! I I : I I I I I I I I I I I I I I I I I I I I : I I I 

orf 81-1 MKKSFLTLVLYSSLLTASEIAYRFVFGIETLPAAKIAETFALTFVIAALYLFARYKVTRL 

10 20 30 40 50 60 



WO 99/24578 



-207- 



PCT/IB98/0166S 



10 



15 



20 
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40 



45 



50 



orf 81a. pep 
orf81-l 

orf 81a. pep 
orf81-l 

orf 81a. pep 
orf81-l 

orf 81a. pep 
orf81-l 

orf 81a. pep 
orf81-l 

orf 81a. pep 
orf81-l 

orf 81a. pep 
orf81-l 

orf 81a. pep 
orf81-l 



. 70 80 90 100 110 120 

LI AVFFAFS I I ANNVH YAVYQS WITG INYWLMLKE I TE VGGAGASMLDKLWL PALWGVLE 
I I I I I I I I I I I I I I I I I I I I I I I : I I I I I 1 I I i I I: I I I I : I I I I I I I I I I I |: I I I I I I 
LIAVFFAFSIIANNVHYAVYQSWMTGINYWLMLKEVTEVGSAGASMLDKLWLPVLWGVLE 
70 80 90 100 110 120 

130 140 150 160 170 180 

VMLFCSLAKFRRKTHFSADILFAFLMLMIFVRSFDTKQEHGISPKPTYSRIKANYFSFGY 

I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I II I I ! I II I I I I I I I I I I I I I I I I I I I I I I I 
VMLFCSLAKFRRKTHFSADILFAFLMLMIFVRSFDTKQEHGISPKPTYSRIKANYFSFGY 
130 140 150 160 170 180 

190 200 210 220 230 240 

FVGRVLPYQLFDLSKIPVFKQPAPSRIGQGSIQNIVLIMGESESAAHLKLFGYGRETSPF 

I I I I I I I I I I I I I I : I I : I I I I I I I : I II I I : I I I I 1 I I I I I II I I I I I I I I I I I I I I I I 
FVGRVLPYQLFDLSRIPAFKQPAPSKIGQGSVQNIVLIMGESESAAHLKLFGYGRETSPF 
190 200 210 220 230 240 

250 260 270 280 

LTQLSQADFKPIVKQSYSAGFMTAVSLPSFFNVIPHANGLEQISGGD 

I I : I I II I I I I I I II I I I I I I I I I I I I I II I I : I I I II I I I I I I I I I 
LTRLSQADFKPIVKQSYSAGFMTAVSLPSFFNAIPHANGLEQISGGDTNMFRLAKEQGYE 

250 260 270 280 290 300 



TYFYSAQAENEMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQQGKHF 
310 320 330 340 350 360 

290 300 310 320 

IVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I 
IVLHQRGSHAPYGALLQPQDKVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 
370 380 390 400 410 420 

330 340 350 360 370 380 

AYTSDHGQYVRQDIYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 
II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I II I I I I I I I 
AYTSDHGQYVRQDIYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 

430 440 450 460 470 480 

390 400 410 420 

LIHTLGYDMPVSGCREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 
I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I 
LIHTLGYDMPVSGCREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 

490 500 510 520 



Homology with a predicted ORF from N .gonorrhoeae 

The aligned aa sequences of ORF8 1 and a predicted ORF (ORF81 .ng) from N. gonorrhoeae of the 
N- and C-termini show 82.4 % and 97.5% identity in 85 and 121 overlap, respectively: 



55 



60 



65 



orf 81 .pep 
orf81ng 
orf 81. pep 
orf 81ng 
orf 81 .pep 
orf 81ng 
orf 81. pep 
orf 81ng 



MKKSFLTLVLYSSLLTASEIAYPLELGIETLPAAKIAETFALTFVIAALYLFARNKVTRL 
IIIIIIIIIIIM : : I 1 I I I I I I I : I I I II I I I : I I I I I I I I I l::|| 
MKKSLFVLFLYSSLLTASEIAYRFVFGIETLPAAKMAETFALTFMIAALYLFARYKASRL 

LI AVFFAFS 1 1 ANNVHY ADYQSWMT 
I I I I I 1 I I I : I I I I I I I I I I I I I I 

LIAVFFAFSMIANNVHYAVYQSWMTGINYWLMLKEVTEVGSAGASMLDKLWLPALWGVAE 

// 

QTVFEQLQKTPDGNWLFAYTSDHGQYVRQD 
I I I I I II I I I I II I I I I I I I I I I I 1 I I I I 
ALLQPQDKVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLFAYTSDHGQYVRQD 



60 



60 



85 



120 



433 



433 



493 



IYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 
I I I I I I II II I I : I I I I I I I I I I I I I I I I I I I I I II I I I I M I I I I I II I I I I I I I I I I I 
IYNQGTVQPDSYIVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 4 93 
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orf 81 .pep CREGSVTGNLITGDAGSLNIRDGKAEYVYPQ 524 

I I I I I I I I I I I I I I I I I I I I I : I I I I I II I I 
orf81ng CREGSVTGNLITGDAGSLNIRNGKAEYVYPQ 524 

The complete length ORF81ng nucleotide sequence <SEQ ID 309> is: 



1 


ATGAAAAAAT 


51 


CAGCGAAATC 


101 


CAAAAATGGC 


151 


CTGTTTGCGC 


201 


GTTCAGCATG 


251 


TGACGGGTAT 


301 


AGCGCGGGCG 


351 


CGTGGCGGAA 


401 


CGCATTTTTC 


451 


GTGCGTTCGT 


501 


ATACAGCCGC 


551 


GCGTGTTGCC 


601 


CAGCCTGCTC 


651 


GATTATGGGC 


701 


GGCGCGAAAC 


751 


CCGATTGTGA 


801 


GCCCAGTTTC 


851 


GCGGCGGCGA 


901 


ACGTATTTTT 


951 


AATCGGTAAG 


1001 


ACGGCAACGG 


1051 


AAAATCAATT 


1101 


TTCGCACGCC 


1151 


GCGAAGCCGA 


1201 


CAAATGATTC 


1251 


CTGGCTGTTT 


1301 


TCTACAATCA 


1351 


TTGTACAGCC 


1401 


GCCTTGCGAG 


1451 


CGTTGGGCTA 


1501 


GGCAACCTGA 


1551 


GGCGGAATAT 



CCCTTTTCGT 
GCCTATCGCT 
GGAAACGTTT 
GTTATAAGGC 
ATTGCCAACA 
TAACTATTGG 
CGTCGATGTT 
GTCATGTTGT 
TGCCGATATA 
TCGACACGAA 
ATCAAAGCCA 
GTATCAGTTG 
CAAGCAAAAT 
GAAAGCGAAA 
TTCGCCGTTT 
AACAAAGTTA 
TTTAACGTCA 
TACCAATATG 
ACAGTGCCCA 
AAATGGATAG 
CGACAATATG 
TGCAGCAGGG 
CCATACGGCG 
TATTGTGGAT 
AAACCGTATT 
GCCTATACCT 
AGGCACGGTG 
CGGATAAGGC 
ATTGCCTTCC 
CGATATGCCG 
TTACGGGCGA 
GTTTATCCGC 



TCTCTTTCTG 
TTGTATTCGG 
GCGCTGACAT 
TTCGCGGCTG 
ATGTGCATTA 
CTGATGCTGA 
GGATAAGTTG 
TTTGCAGCCT 
CTGTTTGCCT 
ACAAGAGCAC 
ATTATTTCAG 
TTTGATTTAA 
CGGGCAAGGC 
GCGCGGCGCA 
TTAACCCGGC 
TTCCGCAGGC 
TACCGCACGC 
TTCCGCCTCG 
GGCTGAAAAC 
ACCATCTGAT 
CCCGATGAGA 
CAGGCATTTT 
CATTGTTGCA 
AAGTACGACA 
CGAGCAGCTG 
CCGATCATGG 
CAGCCCGACA 
CGTGCAACAG 
ATCAGCAGCT 
GTTTCAGGTT 
TGCAGGCAGC 
AATAA 



TATTCATCCC 
AATTGAAACC 
TTATGATTGC 
CTGATTGCGG 
CGCGGTTTAT 
AAGAGGTTAC 
TGGCTGCCTG 
TGCCAAGTTC 
TCCTAATGCT 
GGTATTTCGC 
CTTCGGTTAT 
GCAAGATCCC 
AGTATTCAAA 
TTTGAAATTG 
TGTCGCAAGC 
TTTATGACGG 
CAACGGCTTG 
CCAAAGAGCA 
CAAATGGCAA 
TCAGCCGACG 
AGCTGCTGCC 
ATCGTGTTGC 
GCCTCAAGAT 
ACACCATCCA 
CAAAAGCAGC 
CCAGTATGTG 
GCTATATTGT 
GCTGCCAACC 
TTCAACGTTC 
GTCGCGAAGG 
TTGAACATTC 



TACTTACCGC 
TTACCGGCTG 
TGCGCTGTAT 
TGTTTTTCGC 
CAAAGCTGGA 
CGAAGTCGGC 
CTTTGTGGGG 
CGCCGTAAGA 
GATGATTTTC 
CCAAACCGAC 
TTTGTCGGGC 
TGTGTTCAAA 
ATATCGTCCT 
TTTGGTTACG 
CGATTTTAAG 
CAGTATCCCT 
GAACAAATCA 
GGGCTATGAA 
TTTTGAACTT 
CAACTTGGCT 
GTTGTTCGAC 
ACCAACGCGG 
AAAGTATTCG 
CAAAACCGAC 
CTGACGGCAA 
CGCCAAGATA 
GCCTCTGGTT 
AGGCTTTTGC 
CTGATTCACA 
CTCGGTAACA 
GCAACGGCAA 



This encodes a protein having amino acid sequence <SEQ ID 310>: 



1 MKKSLFVLFL YSSLLTASEI AYRFVFGIET LPAAKMAETF ALTFMIAALY 

51 LFARYKASRL LIAVFFAFSM IANNVH YAVY QSWMTGINYW LMLKEVTEVG 

101 SAGASMLDKL W LPALWGVAE VMLFCSLA KF RRKT HFSADI LFAFLMLMIF 

151 VRSFDTKQEH GISPKPTYSR IKANYFSFGY FVGRVLPYQL FDLSKIPVFK 

201 QPAPSKIGQG SIQNIVLIMG ESESAAHLKL FGYGRETSPF LTRLSQADFK 

251 PIVKQSYSAG FMTAVSLPSF FNVIPHANGL EQISGGDTNM FRLAKEQGYE 

301 TYFYSAQAEN QMAILNLIGK KWIDHLIQPT QLGYGNGDNM PDEKLLPLFD 

351 KINLQQGRHF IVLHQRGSHA PYGALLQPQD KVFGEADIVD KYDNTIHKTD 

401 QMIQTVFEQL QKQPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYIVPLV 

4 51 LYSPDKAVQQ AANQAFAPCE IAFHQQLSTF LIHTLGYDMP VSGCREGSVT 

501 GNLITGDAGS LNIRNGKAEY VYPQ* 

ORF81ng and ORF81-1 show 96.4% identity in 524 aa overlap: 



10 20 30 40 50 60 

orf8lng-l.pep MKKSLFVLFLYSSLLTASEIAYRFVFGIETLPAAKMAETFALTFMIAALYLFARYKASRL 

I I I I : : : I I I II I I I I I I I I I I I I I I ! I I I I I I I : I II I I I II : I I I I I I I I I I I : : I I 
orf 81-1 MKKSFLTLVLYSSLLTASEIAYRFVFGIETLPAAKIAETFALTFVIAALYLFARYKVTRL 

10 20 30 40 50 60 



orf81ng-l .pep 
orf81-l 



70 80 90 100 110 120 

LIAVFFAFSM I ANNVHYAVYQSWMTG I NYWLMLKEVTEVGSAGASMLDKLWLPALWGVAE 
I I I I I I I II : I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II II I I I I I I I I I : I I I I I 
LIAVFFAFS 1 1 ANNVHYAVYQSWMTG IN YW LMLKEVTEVG SAGASMLDKLWLPVLWGVLE 

70 80 90 100 110 120 



orf81ng-l.pep 
orf81-l 



130 140 150 160 170 180 

VMLFCSLAKFRRKTHFSADILFAFLMLMIFVRSFDTKQEHGISPKPTYSR IKANYFSFGY 
I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 1 I I I I I M M I I I I I I I I I I I I I I I I 
VMLFCSLAKFRRKTHFSADILFAFLMLMIFVRSFDTKQEHGISPKPTYSRIKANYFSFGY 
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140 



150 



160 



170 



180 



orf81ng-l.pep 
orf81-l 



190 200 210 220 230 240 

FVGRVLPYQLFDLSKIPVFKQPAPSKIGQGSIQNIVLIMGESESAAHLKLFGYGRETSPF 

I | | I I | I | I I I II I : I I : I M I I I I I I I I I I : M II I I I I I I I I M I I I I I I I I I I I ! M 
FVGRVLPYQLFDLSRIPAFKQPAPSKIGQGSVQNIVLIMGESESAAHLKLFGYGRETSPF 
190 200 210 220 230 240 
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orf 81ng~l .pep 



orf81-l 



orf81ng-l.pep 



orf81-l 



orf81ng-l.pep 
orf81-l 



250 260 270 280 290 300 

LTRLSQADFKPIVKQSYSAGFMTAVSLPSFFNVIPHANGLEQISGGDTNMFRIAKEQGYE 
I I | | | I I I I I I I I I I I I I I I I I I I I I I I I I I i : I I I I I I I I I I I I I I ! I I II I I I I I I I I 
LTRLSQADFKPIVKQSYSAGFMTAVSLPSFFNAIPHANGLEQISGGDTNMFRLAKEQGYE 

250 260 270 280 290 300 

310 320 330 340 350 360 

TYFYSAQAENQMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQQGRHF 
M I I I M I I I : I I I I I I I I I I II 1 I I I I I M I I I I M I M I I I i II I I I I II I I I M : II 
TYFYSAQAENEMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQQGKHF 

310 320 330 340 350 360 

370 380 390 400 410 420 

IVLHQRGSHAPYGALLQPQDKVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 
I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 
IVLHQRGSHAPYGALLQPQDKVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 

370 380 390 400 410 420 

430 440 450 460 470 480 

AYTSDHGQYVRQDIYNQGTVQPDSYIVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 
I I I I MM II I I I II llllil III 1:11 III II II M II Ml III I I M Ml II II III I 
AYTSDHGQYVRQDIYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 

430 440 450 460 470 480 



35 



orf81ng-l.pep 
orf81-l 



490 500 510 520 

LIHTLGYDMPVSGCREGSVTGNLITGDAGSLNIRNGKAEYVYPQX 
I I M I I I I I I I I M I I I I I M I I I M M I I I I I I : I I I I I I I I I I 
LIHTLGYDMPVSGCREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 

490 500 510 520 



40 



Furthermore, ORF81ng shows significant homology to an E.coli OMP: 

gi 1 1256380 (U50906) outer membrane adherence protein-associated protein [E. 
coli] Length = 547 
Score =87.4 bits (213), Expect - 2e-16 

Identities - 122/468 (26%), Positives - 198/468 (42%), Gaps = 70/468 (14%) 

VFGIETLPAAKMAETFA-LTFMIAALYLFARYKAS — RLLIAVFFAFSMIANNVHYAVYQ 81 
VFGI LA+A LF+++R + RLL+A F + A ++ ++Y 

VFGITNLVASSGAHMVQRLLFFVLTILWKRISSLPLRLLVAAPFVL-LTAADMSISLY- 86 

SW MT GINYWLMLKEVTEVGSAGASMLDKLWLPALWGVAEVMLFCSLAKFRRKT 134 

SW T G ++ + EV A ML ++ P L A + L + 
SWCTFGTTFNDGFAI SVLQSDPDEV AKMLG-MYSPYLCAFAFLSLLFLAVI IKYDV 141 



45 


Query: 


25 






Sbjct: 


29 




Query: 


82 


50 


Sbjct: 


87 




Query: 


135 


55 


Sbjct: 


142 






Query: 


184 




Sbjct: 


202 


60 


Query: 


242 




Sbjct: 


258 


65 


Query: 


299 






Sbjct: 


311 




Query : 


356 


70 


Sbjct: 


360 



L+L++ 



+Q L + +P F+ 



-DTKQEHGISPKPTYSRIKAN — YFSFGYFVG 183 
D K ++ SP SR +F+ YF 



VLI+GES 



+Q 



Q+ S 



TA+S+P 



+ +V+ H 



++ L+GY R T+P + 



I N+ +A + G 

- DIHNYPDNI INMANQAG 310 



++T++ S+Q+ +N A+ ++ 



++ + Y G 
- AMRAMET VYVRG F- 



DE LLP + Q 
-DELLLPHLSQALQQ 359 



Q + IVLH GSH P + 



VF 



D D YDN+IH TD ++ VFE L+ 
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Query: 413 QPDGNWLFAYTSDHG QYVRQDIYNQG — TVQPDSYIVPL-VLYSP 454 

D Y +DHG ++++Y G +Y VP+ + YSP 

Sbjct: 419 — DRRASVMYFADHGLERDPTKKNVYFHGGREASQQAYHVPMFIWYSP 464 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 37 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 31 1>: 

1 . . .ACCCTGCTCC TCTTCATCCC CCTCGTCCTC ACAC.GTGCG GCACACTGAC 

51 CGGCATACTC GCCCaCGGCG GCGGCAAACG CTTTGCCGTC GAACAAGAAC 

101 TCGTCGCCGC ATCGTCCCGC GCCGCCGTCA AAGAAATGGA TTTGTCCGCC 

151 yTAAAAGGAC GCAAAGCCGC CyTTTACGTC TCCGTTATGG GCGACCAAGG 

201 TTCGGGCAAC ATAAGCGGCG GACGCTACTC TATCGACGCA CTGATACGCG 

251 GCGGCTACCA CAACAACCCC GAAAGTGCCA CCCAATACAG CTACCCCGCC 

301 TACGACACTA CCGCCACCAC CAAATCCGAC GCGCTCTCCA GCGTAACCAC 

351 TTCCACATCG CTTTTGAACG CCCCCGCCGC CGyCyTGACG AAAAACAGCG 

401 GACGCAAAGG CGAACGcTCC GCCGGACTGT CCGTCAACGG CACGGGCGAC 

451 TACCGCAACG AAACCCTGCT CGCCAACCCC CGCGACGTTT CCTTCCTGAC 

501 CAACCTCATC CAAACCGTCT TCTACCTGCG CGGCATCGAA GTCgTACCGC 

551 CCGrATACGC CGACACCGAC GTATTCGTAA CCGTCGACGT A. . . 

This corresponds to the amino acid sequence <SEQ ID 312; ORF83>: 

1 ..TLLLFIPLVL TXCGTLTGIL AHGGGKRFAV EQELVAASSR AAVKEMDLSA 

51 LKGRKAAXYV SVMGDQGSGN ISGGRYSIDA LIRGGYHNNP ESATQYSYPA 

101 YDTTATTKSD ALSSVTTSTS LLNAPAAXLT KNSGRKGERS AGLSVNGTGD 

151 YRNETLLANP RDVSFLTNLI QTVFYLRGIE WPPXYADTD VFVTVDV. . 

Further work revealed the complete nucleotide sequence <SEQ ID 313>: 



1 ATGAAAACCC TGCTCCTCCT CATCCCCCTC GTCCTCACAG CCTGCGGCAC 

51 ACTGACCGGC ATACCCGCCC ACGGCGGCGG CAAACGCTTT GCCGTCGAAC 

101 AAGAACTCGT CGCCGCATCG TCCCGCGCCG CCGTCAAAGA AATGGATTTG 

151 TCCGCCCTAA AAGGACGCAA AGCCGCCCTT TACGTCTCCG TTATGGGCGA 

201 CCAAGGTTCG GGCAACATAA GCGGCGGACG CTACTCTATC GACGCACTGA 

251 TACGCGGCGG CTACCACAAC AACCCCGAAA GTGCCACCCA ATACAGCTAC 

301 CCCGCCTACG ACACTACCGC CACCACCAAA TCCGACGCGC TCTCCAGCGT 

351 AACCACTTCC ACATCGCTTT TGAACGCCCC CGCCGCCGCC CTGACGAAAA 

401 ACAGCGGACG CAAAGGCGAA CGCTCCGCCG GACTGTCCGT CAACGGCACG 

451 GGCGACTACC GCAACGAAAC CCTGCTCGCC AACCCCCGCG ACGTTTCCTT 

501 CCTGACCAAC CTCATCCAAA CCGTCTTCTA CCTGCGCGGC ATCGAAGTCG 

551 TACCGCCCGA ATACGCCGAC ACCGACGTAT TCGTAACCGT CGACGTATTC 

601 GGCACCGTCC GCAGCCGTAC CGAACTGCAC CTCTACAACG CCGAAACCCT 

651 TAAAGCCCAA ACCAAGCTCG AATATTTCGC CGTTGACCGC GACAGCCGGA 

701 AACTGCTGAT TACCCCTAAA ACCGCCGCCT ACGAATCCCA ATACCAAGAA 

751 CAATACGCCC TTTGGACCGG CCCTTACAAA GTCAGCAAAA CCGTCAAAGC 

801 CTCAGACCGC CTGATGGTCG ATTTCTCCGA CATTACCCCC TACGGCGACA 

851 CAACCGCCCA AAACCGTCCC GACTTCAAAC AAAACAACGG TAAAAAACCC 

901 GATGTCGGCA ACGAAGTCAT CCGCCGCCGC AAAGGAGGAT AA 

This corresponds to the amino acid sequence <SEQ ID 314; ORF83-l>: 

1 MKTLLLLIPL VLTA CGTLTG IPAHGGGKRF AVEQELVAAS SRAAVKEMDL 

51 SALKGRKAAL YVSVMGDQGS GNISGGRYSI DALIRGGYHN NPESATQYSY 

101 PAYDTTATTK SDALSSVTTS TSLLNAPAAA LTKNSGRKGE RSAGLSVNGT 

151 GDYRNETLLA NPRDVSFLTN LIQTVFYLRG IEWPPEYAD TDVFVTVDVF 

201 GTVRSRTELH LYNAETLKAQ TKLEYFAVDR DSRKLLITPK TAAYESQYQE 

251 QYALWTGPYK VSKTVKASDR LMVDFSDITP YGDTTAQNRP DFKQNNGKKP 
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301 DVGNEVIRRR KGG* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF83 shows 96.4% identity over a 197aa overlap with an ORF (ORF83a) from strain A of K 
meningitidis: 

10 20 30 40 50 

orf 83 .pep TLLLFIPLVLTX CGTLTGILAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAX 

III :|lllli I I I t I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
O r f 8 3 a MKTLLXLI PLVLTA CGTLTG I PAHGGGKRFAVEQELVAAS SRAAVKEMDLS ALKGRKAAL 

10 20 30 40 50 60 

60 70 80 90 100 110 

orf 83 . pep YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 

I I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 83a YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 

70 80 90 100 110 120 



120 130 140 150 160 170 

orf 83 . pep TSLLNAPAAXLTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 
I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I II I I I I I I I I I 
orf 83a TSLLNAPAAALTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 
130 140 150 160 170 180 



180 190 
orf 83 .pep IEWPPXYADTDVFVTVDV 
I I I I I I M M I I I I I I I I 

orf 83a IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 
190 200 210 220 230 240 

The complete length ORF83a nucleotide sequence <SEQ ID 315> is: 



1 ATGAAAACCC TGCTCNTCCT CATCCCCCTC GTCCTCACAG CCTGCGGCAC 

51 ACTGACCGGC ATACCCGCCC ACGGCGGCGG CAAACGCTTT GCCGTCGAAC 

101 AAGAACTCGT CGCCGCATCG TCCCGCGCCG CCGTCAAAGA AATGGACTTG 

151 TCCGCCCTGA AAGGACGCAA AGCCGCCCTT TACGTCTCCG TTATGGGCGA 

201 CCAAGGTTCG GGCAACATAA GCGGCGGACG CTACTCTATC GACGCACTGA 

251 TACGCGGCGG CTACCACAAC AACCCCGAAA GTGCCACCCA ATACAGCTAC 

301 CCCGCCTACG ACACTACCGC CACCACCAAA TCCGACGCGC TCTCCAGCGT 

351 AACCACTTCC ACATCGCTTT TGAACGCCCC CGCCGCCGCC CTGACGAAAA 

401 ACAGCGGACG CAAAGGCGAA CGCTCCGCCG GACTGTCCGT CAACGGCACG 

451 GGCGACTACC GCAACGAAAC CCTGCTCGCC AACCCCCGCG ACGTTTCCTT 

501 CCTGACCAAC CTCATCCAAA CCGTCTTCTA CCTGCGCGGC ATCGAAGTCG 

551 TACCGCCCGA ATACGCCGAC ACCGACGTAT TCGTAACCGT CGACGTATTC 

601 GGCACCGTCC GCAGCCGCAC CGAACTGCAC CTCTACAACG CCGAAACCCT 

651 TAAAGCCCAA ACCAAGCTCG AATATTTCGC CGTTGACCGC GACAGCCGGA 

701 AACTGCTGAT TGCCCCTAAA ACCGCCGCCT ACGAATCCCA ATACCAAGAA 

751 CAATACGCCC TCTGGATGGG ACCTTACAGC GTCGGCAAAA CCGTCAAAGC 

801 CTCAGACCGC CTGATGGTCG ATTTCTCCGA CATCACCCCC TACGGCGACA 

851 CAACCGCCCA AAACCGTCCC GACTTCAAAC AAAACAACGG TAAAAAACCC 

901 GATGTCGGCA ACGAAGTCAT CCGCCGCCGC AAAGGAGGAT AA 

This encodes a protein having amino acid sequence <SEQ ED 316>: 



1 MKTLLXLI PL VLTA CGTLTG IPAHGGGKRF AVEQELVAAS SRAAVKEMDL 

51 SALKGRKAAL YVSVMGDQGS GNISGGRYSI DALIRGGYHN NPESATQYSY 

101 PAYDTTATTK SDALSSVTTS TSLLNAPAAA LTKNSGRKGE RSAGLSVNGT 

151 GDYRNETLLA NPRDVSFLTN LIQTVFYLRG IEWPPEYAD TDVFVTVDVF 

201 GTVRSRTELH LYNAETLKAQ TKLEYFAVDR DSRKLLIAPK TAAYESQYQE 

251 QYALWMGPYS VGKTVKASDR LMVDFSDITP YGDTTAQNRP DFKQNNGKKP 

301 DVGNEVIRRR KGG* 

ORF83a and ORF83-1 show 98.4% identity in 313 aa overlap: 



10 20 30 40 50 60 

orf 83a . pep MKTLLXLI PLVLTACGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 
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I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I 
MKTLLLLIPLVLTACGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 
10 20 30 40 50 60 

70 80 90 100 110 120 

YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 
70 80 90 100 110 120 

130 140 150 160 170 180 

TSLLNAPAAALTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 
III I I I I II IIMMI II i I I Ml I III Ml tl II I I I II I M M I Ml I Ml IMM II 

TSLLNAPAAALTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 
130 140 150 160 170 180 

190 200 210 220 230 240 

IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 
M I II II II IMMM II I I IMMMI IIMMI Ml II II I I MM I II I II II MM 
IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLITPK 
190 200 210 220 230 240 

250 260 270 280 290 300 

T AAYE S Q YQEQ Y ALWMG P YS VGKTVKAS DRLMVD FS D I T P YG DTT AQNR PD FKQNNGKKP 
I I I I I I I I I I I I I I I I I I : I : I I I I I I I! II I II I I M I I I I I I II II I I I II M I M I 
TAAYESQYQEQYALWTGPYPCVSKTVKASDRLMVDFSDITPYGDTTAQNRPDFKQNNGKKP 
250 260 270 280 290 300 

310 

DVGNEVIRRRKGGX 
I I II I II I I I I II I 
DVGNEVIRRRKGGX 
310 

Homology with a predicted ORF from ^gonorrhoeae 

ORF83 shows 94.9% identity over a 197aa overlap with a predicted ORF (ORF83.ng) from N. 
gonorrhoeae: 

orf 83 . pep T LLLFI PLVLTXCGT LTG I LAHGGGKRFAVEQELVAAS SRAAVKEMDL S ALKGRKAAX 58 

I I II : II II I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I II I I I I I I 
orf83ng MKTLLLLIPLVLTACGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 60 

orf 83. pep YVS VMGDQGSGN I SGGRYSIDALIRGGYHNNPESATQYS YPAYDTTATTKS DALS SVTT S 118 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I : I I I I I I I I I I I I I I I I I I : II I I 
orf83ng YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPDSATRYSYPAYDTTATTKSDALSGVTTS 120 

orf 83 . pep TSLLNAPAAXLTPCNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 178 

MM I II II MM: II II ill! I MM II I IIMMI II Mill HI I I II MINIM 
orf83ng TSLLNAPAAALTKNNGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 180 

orf 83. pep IEWPPXYADTDVFVTVDV 197 
I I I I I I I I I II I I I I II I 

orf83ng IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 240 

The complete length ORF83ng nucleotide sequence <SEQ ID 31 7> is: 

1 ATGAAAACCC TGCTCCTCCT CATCCCCCTC GTACTCACCG CCTGCGGCAC 

51 ACTGACCGGC ATACCCGCCC ACGGCGGCGG CAAACGCTTT GCCGTCGAAC 

101 AGGAACTCGT CGCCGCATCG TCCCGCGCCG CCGTCAAAGA AATGGACTTG 

151 TCCGCCCTGA AAGGACGCAA AGCCGCCCTT TACGTCTCCG TTATGGGCGA 

201 CCAAGGTTCG GGCAACATAA GCGGCGGACG CTACTCCATC GACGCACTGA 

251 TACGCGGCGG CTACCACAAC AACCCCGACA GCGCCACCCG ATACAGCTAC 

301 CCCGCCTATG ACACTACCGC CACCACCAAA TCCGACGCGC TCTCCGGCGT 

351 AACCACTTCC ACATCGCTTT TGAACGCCCC CGCCGCCGCC CTGACGAAAA 

401 ACAACGGACG CAAAGGCGAA CGCTCCGCCG GACTGTCCGT CAACGGCACG 

4 51 GGCGACTACC GCAACGAAAC CCTGCTCGCC AACCCCCGCG ACGTTTCCTT 

501 CCTGACCAAC CTCATCCAAA CCGTCTTCTA CCTGCGCGGC ATCGAAGTCG 

551 TACCGCCCGA ATACGCCGAC ACCGACGTAT TCGTAACCGT CGACGTATTC 

601 GGCACCGTCC GCAGCCGTAC CGAACTGCAC CTCTACAACG CCGAAACCCT 



orf83-l 

orf 83a. pep 
orf83-l 

orf83a.pep 
orf83-l 

orf 83a. pep 
orf83-l 

orf 83a .pep 
orf83-l 

orf 83a. pep 
orf83-l 
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651 TAAAGCCCAA 

701 AACTGCTGAT 

751 CAATACGCCC 

801 CTCAGACCGC 

851 CAACCGCCCA 

901 GATGTCGGCA 



ACCAAGCTCG 
TGCCCCTAAA 
TCTGGATGGG 
CTGATGGTCG 
AAACCGTCCC 
ACGAAGTCAT 



AATATTTCGC 
ACCGCCGCCT 
ACCTTACAGC 
ATTTCTCCGA 
GACTTCAAAC 
CCGCCGCCGC 



CGTCGACCGC 
ACGAATCCCA 
GTCGGCAAAA 
CATCACCCCC 
AAAACAACGG 
AAAGGAGGAT 



GACAGCCGGA 
ATACCAAGAA 
CCGTCAAAGC 
TACGGCGACA 
TAAAAACCCC 
AA 



10 



This encodes a protein having amino acid sequence <SEQ ID 318>: 

1 MKTL LLLIPL VLTAC GTLTG IPAHGGGKRF AVEQELVAAS SRAAVKEMDL 

51 SALKGRKAAL YVSVMGDQGS GNISGGRYSI DALIRGGYHN NPDSATRYSY 

101 PAYDTTATTK SDALSGVTTS TSLLNAPAAA LTKNNGRKGE RSAGLSVNGT 

151 GDYRNETLLA NPRDVSFLTN LIQTVFYLRG IEWPPEYAD TDVFVTVDVF 

201 GTVRSRTELH LYNAETLKAQ TKLEYFAVDR DSRKLLIAPK TAAYESQYQE 

251 QYALWM GPYS VGKT VKASDR LMVDFSDITP YGDTTAQNRP DFKQNNGKNP 

301 DVGNEVIRRR KGG* 

1 5 ORF83ng and ORF83- 1 show 97. 1 % identity in 3 1 3 aa overlap 



20 



25 



30 



35 



40 



45 



50 



orf 83-1 .pep 
orf83ng 



orf 83-1. pep 
orf83ng 



orf 83-1. pep 
orf 83ng 



orf 83-1. pep 
orf 83ng 



orf83-l.pep 
orf83ng 

orf 83-1. pep 
orf83ng 



10 20 30 40 50 60 

MKTLLLLIPLVLTACGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 
M II M MINI I M Ml I III I II I Ml M I III II I Ml II I M I Ml I II MflMI 
MKTLLLLIPLVLTACGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 

10 20 30 40 50 60 

70 80 90 100 110 120 

YVSVMGDQGSGNI SGGRYS I DALIRGGYHNNPESATQYS YPAYDTTATTKSDALS SVTTS 
I I I I I I M I I I I I I I I I I I I I I I I I I I I I II I : I I I : I I I I I I I I I I I I I I I I I I : I I I I 
YVSVMGDQGSGNI SGGRYS IDALIRGGYHNNPDSATRYSYPAYDTTATTKSDALSGVTTS 

70 80 90 100 110 120 

130 140 150 160 170 180 

TSLLNAPAAALTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTN LIQTVFYLRG 
I I II I I I II I I II I : I M M II I I I I I I M I I I I I I I I I I I I I I M I I I I I I I I M I I I I 
TSLLNAPAAALTKNNGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 

130 140 150 160 170 180 

190 200 210 220 230 240 

IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLITPK 
I I I I M I M I M I I I I I ! M I I I I I I I I I I I I II I I I I I I I I M I I I I I I I I I 1 I I h II 
IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 

190 200 210 220 230 240 

250 260 270 280 290 300 

TAAYESQYQEQYALWTGPYKVSKTVKASDRLMVDFSDITPYGDTTAQNRPDFKQNNGKKP 
I I I I M I I I I I I I I I M I : I : I I I M I I II I IE I I I I I I I I I | I I I I I I j I | I i M f : I 
TAAYESQYQEQYALWMGPYSVGKTVKASDRLMVDFSDITPYGDTTAQNRPDFKQNNGKNP 

250 260 270 280 290 300 

310 

DVGNEVIRRRKGGX 
I I I I I I I I I II I I I 
DVGNEVIRRRKGGX 
310 



55 



Based on this analysis, including the presence of a putative ATP/GTP-binding site motif A (P-loop) 
in the gonococcal protein (double-underlined) and a putative prokaryotic membrane lipoprotein 
lipid attachment site (single-underlined), it is predicted that the proteins from N.meningitidis and 
N.gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 
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Example 38 

The following DNA sequence, believed to be complete, was identified in meningitidis <SEQ ID 
319>: 

1 ATGGCAGAGA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 

51 AAAAATGGTT TCCATGATGG CGAATGATGA AATGTTTAAG CCTGATGAAA 

101 AAGCCATACG CCGTAAAGTA TTTACGAACA TAAAAGGCTT GAAAATACCG 

151 CACACCTACA TAGAAACGGA CGCAAAAAAG CTGCCGAAAT CGACAGATGA 

201 GCAGCTTTCG GCGCATGATA TGTACGAATG GATAAAGAAG CCCGAAAATA 

251 TCGGGTCTAT TGTCATTGTA GATGAAGCTC AAGACGTATG GCCGGCACGC 

301 TCGGCAGGTT CAAAAATCCC TGAAAATGTC CAATGGCTGA ATACGCACAG 

351 ACATCAGGGC ATTGATATAT TTGTTTTGAC TCAAGGTCCT AAGCTTCTAG 

401 ATCAAAATCT TAGAACGCTT GTACGGAAAC ATTACCACAT CGCTTCAAAC 

451 AAGATGGGTA TGCGTACGCT TTTAGAATGG AAAATATGCG CGGACGATCC 

501 CGTAAAAATG GCATCAAGCG CATTCTCCAG TATCTATACA CTGGATAAAA 

551 AAGTTTATGA CTTGTAysrr TmmGCGGAAG TTCATACCGT AAATAAGGTC 

601 AAGCGGTCAA AGTGGTTTTA CACTCTGCCa GTAATAGTAT TGCTGATTCC 

651 CGTGTTTGTC GGCCTGTCCT ATAAAATGTT GagCaGTTAC GGAAAAAAAC 

701 aGGAAGAACC CGCAGCACAA GAATCGGCGG CAACAGAACA GCAGGCAGTA 

751 CTTCCGGATA AAACAGAAGG CGAGCCGGTA AATAACGGCA ACCTTACCGC 

801 AGATATGTTT GTTCCGACAT TGTCCGAaAA ACCCGrAAGC AAGCcgaTTT 

851 ATAACGGTGT AAGGCAGGTA AGAACCTTTG AATATATAGC AGGCTGTATA 

901 GAAGGCGGAA GAACCGGATG CGCCTGCTAT TCGCaTCAAG GGACGGCATt 

951 gaAAGAAGTG ACGGaGTTGA TGTGc^aAgG aCTATGTaAA AAacGGCTTG 

1001 CCGTTTAACC CaTACAAAGA AGAAAGCCAA GGGCAGGAAG TTCAGCAAAG 

1051 CGCGCAgCAA CATTCGGACA GGGCGcCAAG TTGCCACATT GGGCGGAAAA 

1101 CCGTAGCAGA ACCTAATGTA CGATAATTGG GAAGAACGCG GGAAACCGTT 

1151 TGAAGGAATC GGaCGGGGGC GTGGTCGGAT CGGCAAACTG A 

This corresponds to the amino acid sequence <SEQ ED 320; ORF84>: 

1 MAEICLITGT PGSGKTLKMV SMMANDEMFK PDEKAIRRKV FTNIKGLKIP 

51 HTYIETDAKK LPKSTDEQLS AHDMYEWIKK PENIGSIVIV DEAQDVWPAR 

101 SAGSKIPENV QWLNTHRHQG IDIFVLTQGP KLLDQNLRTL VRKHYHIASN 

151 KMGMRTLLEW KICADDPVKM ASSAFSSIYT LDKKVYDLYX XAEVHTVNKV 

201 KRSKWFYTLP VIVLLIPVFV GLSYKMLSSY GKKQEEPAAQ ESAATEQQAV 

251 LPDKTEGEPV NNGNLTADMF VPTLSEKPXS KPIYNGVRQV RTFEYIAGCI 

301 EGGRTGCACY SHQGTALKEV TELMCKDYVK NGLPFNPYKE ESQGQEVQQS 

351 AQQHSDRAQV ATLGGKPXQN LMYDNWEERG KPFEGIGGGV VGSAN* 

Further work revealed the complete nucleotide sequence <SEQ ID 321>: 

1 ATGGCAGAGA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 

51 AAAAATGGTT TCCATGATGG CGAATGATGA AATGTTTAAG CCTGATGAAA 

101 ACGGCATACG CCGTAAAGTA TTTACGAACA TAAAAGGCTT GAAAATACCG 

151 CACACCTACA TAGAAACGGA CGCAAAAAAG CTGCCGAAAT CGACAGATGA 

201 GCAGCTTTCG GCGCATGATA TGTACGAATG GATAAAGAAG CCCGAAAATA 

251 TCGGGTCTAT TGTCATTGTA GATGAAGCTC AAGACGTATG GCCGGCACGC 

301 TCGGCAGGTT CAAAAATCCC TGAAAATGTC CAATGGCTGA ATACGCACAG 

351 ACATCAGGGC ATTGATATAT TTGTTTTGAC TCAAGGTCCT AAGCTTCTAG 

4 01 ATCAAAATCT TAGAACGCTT GTACGGAAAC ATTACCACAT CGCTTCAAAC 

451 AAGATGGGTA TGCGTACGCT TTTAGAATGG AAAATATGCG CGGACGATCC 

501 CGTAAAAATG GCATCAAGCG CATTCTCCAG TATCTATACA CTGGATAAAA 

551 AAGTTTATGA CTTGTACGAA TCAGCGGAAG TTCATACCGT AAATAAGGTC 

601 AAGCGGTCAA AGTGGTTTTA CACTCTGCCA GTAATAGTAT TGCTGATTCC 

651 CGTGTTTGTC GGCCTGTCCT ATAAAATGTT GAGCAGTTAC GGAAAAAAAC 

701 AGGAAGAACC CGCAGCACAA GAATCGGCGG CAACAGAACA GCAGGCAGTA 

751 CTTCCGGATA AAACAGAAGG CGAGCCGGTA AATAACGGCA ACCTTACCGC 

801 AGATATGTTT GTTCCGACAT TGTCCGAAAA ACCCGAAAGC AAGCCGATTT 

851 ATAACGGTGT AAGGCAGGTA AGAACCTTTG AATATATAGC AGGCTGTATA 

901 GAAGGCGGAA GAACCGGATG CGCCTGCTAT TCGCATCAAG GGACGGCATT 

951 GAAAGAAGTG ACGGAGTTGA TGTGCAAGGA CTATGTAAAA AACGGCTTGC 

1001 CGTTTAACCC ATACAAAGAA GAAAGCCAAG GGCAGGAAGT TCAGCAAAGC 

1051 GCGCAGCAAC ATTCGGACAG GGCGCAAGTT GCCACATTGG GCGGAAAACC 

1101 GTAGCAGAAC CTAATGTACG ATAATTGGGA AGAACGCGGG AAACCGTTTG 

1151 AAGGAATCGG CGGGGGCGTG GTCGGATCGG CAAACTGA 



This corresponds to the amino acid sequence <SEQ ID 322; ORF84-l>: 
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1 MAEICLITGT PGSGKTLKMV SMMANDEMFK PDENGIRRKV FTNIKGLKIP 

51 HTYIETDAKK LPKSTDEQLS AHDMYEWIKK PENIGSIVIV DEAQDVWPAR 

101 SAGSKIPENV QWLNTHRHQG IDIFVLTQGP KLLDQNLRTL VRKHYHIASN 

151 KMGMRTLLEW KICADDPVKM ASSAFSSIYT LDKKVYDLYE SAEVHTVNKV 

201 KRSK WFYTLP VIVLLIPVFV GL SYKMLSSY GKKQEEPAAQ ESAATEQQAV 

251 LPDKTEGEPV NNGNLTADMF VPTLSEKPES KPIYNGVRQV RTFEYIAGCI 

301 EGGRTGCACY SHQGTALKEV TELMCKDYVK NGLPFNPYKE ESQGQEVQQS 

351 AQQHSDRAQV ATLGGKP*QN LMYDNWEERG KPFEGIGGGV VGSAN* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from meningitidis (strain A) 

ORF84 shows 93.9% identity over a 395aa overlap with an ORF (ORF84a) from strain A of N. 



meningitidis: 



10 20 30 40 50 60 

orf84.pep MAE ICLITGTPGSGKTLKMVSMMANDEMFKPDEKAIRRKVFTN IKGLKI PHTYIETDAKK 
I I I I I I I I I I I I I I I I I I I I I I I M I II I I I I I :: I I I I I I M M I I II I I I I I I I I I I I 
orf84a MAEICLITGT PGSGKTLKMVSMMANDEMFKPDENGIRRKVFTNIKGLKIPHTYIETDAKK 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 84 . pep LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 84a LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 84 . pep IDIFVLTQGPKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 
I I I I I I I I I I I I I I I I I I I I I I i I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 84a IDIFVLTQGSKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 84 , pep LDKKVYDLYXXAEVHTVNICVKRSK WFYTLPVIVLLIPVFVGL SYKMLSSYGKKQEEPAAQ 
MINIMI I I I II M I I I I I M II II II I : I I II I I I II II II II ! I I 1 M M I II I 
orf 84a LDKKVYDLYE SAEVHTVNKVKRSKW FYTLPV 1 1 LLI PVFVGL S YKMLSSYGKKQEEPAAQ 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 84 . pep ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPXSKPIYNGVRQVRTFEYIAGCI 

I I II I I : II I : I M I I \ II I I II I I I I I II I II II I I I I I I I I II II I II I II II II : 
orf 84a ESAATEHQAVFQDKTEGEPVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCV 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 8 4. pep EGGRTGCACYSHQGTALKEVTELMCKDYVKNGLP FN PYKEESQGQEVQQS AQQHSDRAQV 

II II I I I : I I I I I II I I I I : 1 : 11 I I I :: I I I M 1 II I II I M :: It I I 1 = 1111 II 
orf 84a EGGRTGCTCYSHQGTALKEITKEMCKDYARNGLPFNPYKEESQGRDVQQSEQHHSDRPQV 

310 320 330 340 350 360 



370 380 390 

orf 84 . pep ATLGGKPXQNLMYDNWEERGKPFEGIGGGWGSANX 
II I I II I I II II I II : II I I I II I I I I I I I II II I 
orf 84a ATLGGKPWQNLMYDNWQERGKPFEGIGGGWGSANX 

370 380 390 

The complete length ORF84a nucleotide sequence <SEQ ID 323> is: 



1 ATGGCAGAGA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 

51 AAAAATGGTT TCCATGATGG CAAACGATGA AATGTTTAAG CCGGATGAAA 

101 ACGGCATACG CCGTAAAGTA TTTACGAACA TCAAAGGCTT GAAGATACCG 

151 CACACCTACA TAGAAACGGA CGCGAAAAAG CTGCCGAAAT CGACAGATGA 

201 GCAGCTTTCG GCGCATGATA TGTACGAATG GATAAAGAAG CCCGAAAATA 

251 TCGGGTCTAT TGTCATTGTA GATGAAGCTC AAGACGTATG GCCGGCACGC 

301 TCGGCAGGTT CAAAAATCCC TGAAAATGTC CAATGGCTGA ATACGCACAG 

351 ACATCAGGGC ATTGATATAT TTGTTTTGAC TCAAGGCTCT AAGCTTCTAG 

401 ATCAAAATCT TAGAACGCTT GTACGGAAAC ATTACCACAT CGCTTCAAAC 

451 AAGATGGGTA TGCGTACGCT TTTAGAATGG AAAATATGCG CGGACGATCC 
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501 CGTAAAAATG GCATCAAGCG CATTCTCCAG TATCTATACA CTGGATAAAA 

551 AAGTTTATGA CTTGTACGAA TCAGCGGAAG TTCATACCGT AAATAAGGTC 

601 AAGCGGTCAA AATGGTTTTA TACTCTGCCA GTAATAATAT TGCTGATTCC 

651 CGTTTTTGTC GGCCTGTCCT ATAAAATGTT AAGTAGTTAT GGAAAAAAAC 

701 AGGAAGAACC CGCAGCACAA GAATCGGCGG CAACAGAACA TCAGGCAGTA 

751 TTTCAGGATA AAACAGAAGG CGAGCCGGTA AACAACGGTA ACCTTACCGC 

801 AGATATGTTT GTTCCGACAT TGTCCGAAAA ACCCGAAAGC AAGCCGATTT 

851 ATAACGGTGT AAGGCAGGTA AGAACCTTTG AATATATAGC AGGCTGTGTA 

901 GAAGGCGGAA GAACCGGATG CACATGCTAT TCGCATCAAG GGACGGCATT 

951 GAAAGAAATT ACAAAGGAAA TGTGCAAGGA TTACGCAAGA AACGGATTGC 

1001 CGTTTAACCC ATATAAAGAA GAAAGCCAAG GGCGGGATGT CCAGCAAAGT 

1051 GAGCAGCACC ATTCGGACAG ACCGCAAGTT GCCACGTTGG GCGGAAAGCC 

1101 GTGGCAAAAT CTTATGTATG ATAATTGGCA GGAGCGCGGA AAACCGTTTG 

1151 AAGGAATCGG CGGGGGCGTG GTCGGATCGG CAAACTGA 

This encodes a protein having amino acid sequence <SEQ ID 324>: 

1 MAEICLITGT PGSGKTLKMV SMMANDEMFK PDENGIRRKV FTNIKGLKIP 

51 HTYIETDAKK LPKSTDEQLS AHDMYEWIKK PENIGSIVIV DEAQDVWPAR 

101 SAGSKIPENV QWLNTHRHQG IDIFVLTQGS KLLDQNLRTL VRKHYHIASN 

151 KMGMRTLLEW KICADDPVKM ASSAFSSIYT LDKKVYDLYE SAEVHTVNKV 

201 KRSKW FYTLP VIILLIPVFV GL SYKMLSSY GKKQEEPAAQ ESAATEHQAV 

251 FQDKTEGEPV NNGNLTADMF VPTLSEKPES KPIYNGVRQV RTFEYIAGCV 

301 EGGRTGCTCY SHQGTALKEI TKEMCKDYAR NGLPFNPYKE ESQGRDVQQS 

351 EQHHSDRPQV ATLGGKPWQN LMYDNWQERG KPFEGIGGGV VGSAN* 

ORF84a and ORF84-1 show 95.2% identity in 395 aa overlap: 

10 20 30 40 50 60 

MAEICLITGT PGSGKTLKMVSMMANDEMFKPDENGIRRKVFTNIKGLKIPHTYIETDAKK 
Mill Mill Mill II MMMIMII I1IMM llllll I! 1 I I INI I I I MM II I 
MAEICLITGTPGSGKTLKMVSMMANDEMFKPDENGIRRKVFTNIKGLKIPHTYIETDAKK 
10 20 30 40 50 60 

70 80 90 100 110 120 

LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 

I | | | | | | | M M M M I I M M M I I M M M M M M M 1 I M M I M M M M I M M 
LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 

70 80 90 100 110 120 

130 140 150 160 170 180 

IDIFVLTQGSKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 

| | | ] I 1 I | | llllllllllllllllllllllllllllllllllllllllllilllllll 
IDIFVLTQGPKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 

130 140 150 160 170 180 

190 200 210 220 230 240 

LDKKVYDLYESAEVHTVNKVKRSKWFYTLPVIILLIPVFVGLSYKMLSSYGKKQEEPAAQ 
IN II II MM III I MMMIMII I II IMMMI MM I II I II I 111 M II II M I 
LDKKVYDLYESAEVHTVNKVKRSKWFYTLPVIVLLIPVFVGLSYKMLSSYGKKQEEPAAQ 
190 200 210 220 230 240 

250 260 270 280 290 300 

ESAATEHQAVFQDKTEGEPVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCV 

||||||:1M: IIIIIIIIIIIIIIIIIIMIIIIIIIIIIIItlltlllllllllll: 
ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCI 
250 260 270 280 290 300 

310 320 330 340 350 360 

EGGRTGCTCYSHQGTALKEITKEMCKDYARNGLPFNPYKEESQGRDVQQSEQHHSDRPQV 

ll|||||:)||||||||||:l: I I I I I :: I I I 1 I I I I I I I I I I I I I I MM II I I 
EGGRTGCACYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 

310 320 330 340 350 360 

370 380 390 

ATLGGKPWQNLMYDNWQERGKPFEGIGGGWGSANX 

MMMI lllllllhllMIIIIIIIIIMMM 
ATLGGKPXQNLMYDNWEERGKPFEGIGGGWGSANX 
370 380 390 



orf 84a. pep 
orf84-l 

orf 8 4a. pep 
orf84-l 

orf84a.pep 
orf84-l 

orf 8 4a. pep 
orf84-l 

orf84a.pep 
orf84-l 

orf 84a. pep 
orf84-l 

orf84a.pep 
orf84-l 
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MAEICLITGTPGSGKTLKMVSMMANDEMFKPDEKAIRRKVFTNIKGLKIPHTYIETDAKK 
I II 1 II II I MM I! Ml I! Ml II I! Ill IM:::I I I Mil I M I M 111:11 Nil I 
MAEICLITGTPGSGKTLKMVSMMANDEMFKPDENGVRRKVFTNIKGLKIPHTHIETDAKK 



IDIFVLTQGPKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 

lllllllllllllllllllll::lllll:llll:lllllll:llllllllllllllllll 
IDIFVLTQGPKLLDQNLRTLVKRHYHIAANKMGLRTLLEWKVCADDPVKMASSAFSSIYT 



ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPXSKPIYNGVRQVRTFEYIAGCI 
M M M M M M M M II I M I M M M M M I Ml M M M M I M M I I M II I I 
ESAATEQQAVLPDKTEGESVNNGNLTADMFVPTLPEKPESKPIYNGVRQVRTFEYIAGCI 

EGGRTGCACYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 

M M M I : M M M M I I M II I M M M M M M M M M I I M M M M M II M M I 
EGGRTGCTCYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 



60 



60 



120 



LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 
I II II I MM II MM II II II MM I Ml I Ml III II I MM M II I MIMIMM I 
LPKSTDEQLSAHDMYEWIKKPENVGAIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 120 



Homology with a predicted ORF from N.ponorrhoeae 

ORF84 shows 94.2% identity over a 395aa overlap with a predicted ORF (ORF84.ng) from N. 
gonorrhoeae: 

orf84 .pep 
orf 84ng 
orf 84 .pep 
orf84ng 
orf 84 .pep 
orf84ng 
orf 84 .pep 
orf 84ng 
orf 84 .pep 
orf 84ng 
orf 84 .pep 
orf 84ng 



180 



180 



240 



LDKKVYDLYXXAEVHTVNKVKRSKWFYTLPVIVLLIPVFVGLSYKMLSSYGKKQEEPAAQ 
I I I I I I I I I I I : I I I I I I M I I i I I : I I I I : I I II : I I I I I I I I I : I I I I I M I 1 I I I 
LDKKVYDLYESAEIHTVNKVKRSKWFYALPVI ILLI PLFVGLSYKMLGS YGKKQEEPAAQ 240 



300 



300 



360 



360 



orf 84 . pep ATLGGKPXQNLMYDNWEERGKPFEGIGGGWGSAN 395 

I II I I II I I I I I I I M I M M I I I I I II I I I I I I 
orf84ng ' ATLGGKPQQNLMYDNWEERGKPFEGIGGGWGSAN 395 

The complete length ORF84ng nucleotide sequence <SEQ ID 325> is: 



1 ATGGCAGAAA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 

51 AAAAATGGTT TCCATGATGG CAAACGATGA AATGTTTAAG CCAGATGAAA 

101 ACGGCGTACG CCGTAAAGTA TTTACGAACA TCAAAGGTTT GAAGATACCG 

151 CACACCCACA TAGAAACAGA CGCAAAGAAG CTGCCGAAAT CAACCGATGA 

201 ACAGCTTTCG GCGCATGATA TGTATGAATG GATCAAGAAG CCTGAAAacg 

251 tcggcgCAAT CGTTATTGTC GATGAGGCGC AAGACGTATG GCCCGCACGC 

301 TccgCAGGTT CGAAAATCCC CGAAAACGTC CAATGGCTGA ACACACACAG 

351 GCATCAGGGC ATAGATATAT TTGTATTGAC ACAAGGTCCT AAACTCTTAG 

401 ATCAGAACTT GCGAACATTG GTTAAAAGAC ATTACCACAT TGCGGCCAAC 

451 AAAATGGGTT TGCGTACCCT GCTTGAATGG AAAGTATGCG CGGATGACCC. 

501 GGTAAAAATG GCATCAAGTG CATTTTCCAG TATCTACACA CTGGATAAAA 

551 AAGTTTATGA CTTGTACGAA TCCGCAGAAA TTCACACGGT AAACAAAGTC 

601 AAGCGTTCAA AATGGTTTTA TGCATTGCCC GTCATCATAT TATTGATTCC 

651 GCTATTTGTC GGTTTGTCTT ACAAAATGTT GGGCAGTTAC GGAAAAAAAC 

701 AGGAAGAACC CGCAGCACAA GAATCGGCGG CAACAGAACA GCAGGCAGTA 

751 CTTCCGGATA AAACAGAAGG AGAATCGGTG AATAACGGAA ACCTTACGGC 

801 AGATATGTTT GTTCCGACAT TGCCCGAAAA ACCCGAAAGC AAGCCGATTT 

851 ATAACGGTGT AAGGCAGGTA AGGACCTTTG AATATATAGC AGGCTGTATA 

901 GAAGGCGGAA GAACCGGATG CACCTGCTAT TCGCATCAAG GGACGGCATT 

951 GAAAGAAGTG ACGGAGTTGA TGTGCAAGGA CTATGTAAAA AACGGCTTGC 

1001 CGTTTAACCC ATACAAAGAA GAAAGCCAAG GGCAGGAAGT TCAGCAAAGC 

1051 GCGCAGCAAC ATTCGGACAG GGCGCAAGTT GCCACCTTGG GCGGAAAACC 

1101 GCAGCAGAAC CTAATGTACG ACAATTGGGA AGAACGCGGG AAACCGTTTG 

1151 AAGGAATCGG CGGGGGCGTG GTCGGATCGG CAAACTGA 

This encodes a protein having amino acid sequence <SEQ ID 326>: 



1 MAEICLITGT. 

51 HTHIETDAKK 

101 SAGSKIPENV 

151 KMGLRTLLEW 

201 KRSKW FYALP 

251 LPDKTEGESV 

301 EGGRTGCTCY 

351 AQQHSDRAQV 



PGSGKT LKMV 
LPKSTDEQLS 
QWLNTHRHQG 
KVCADDPVKM 
VIILLIPLFV 



NNGNLTADMF 
SHQGTALKEV 
ATLGGKPQQN 



SMMANDEMFK 
AHDMYEWIKK 
IDIFVLTQGP 
ASSAFSSIYT 
GLSYKMLGSY 
VPTLPEKPES 
TELMCKDYVK 
LMYDNWEERG 



PDENGVRRKV 
PENVGAIVIV 
KLLDQNLRTL 
LDKKVYDLYE 
GKKQEEPAAQ 
KPIYNGVRQV 
NGLPFNPYKE 
KPFEGIGGGV 



FTNIKGLKIP 
DEAQDVWPAR 
VKRHYHIAAN 
SAEIHTVNKV 
ESAATEQQAV 
RTFEYIAGCI 
ESQGQEVQQS 
VGSAN* 
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ORF84ng and ORF84-1 show 95.4% identity in 395 aa overlap: 



10 



15 



20 



25 



30 



35 



40 



orf84-l.pep 
orf 84ng 



orf 84-1. pep 
orf 84ng 



orf84-l.pep 
orf 84ng 



orf84-l.pep 
orf 84ng 



orf 84-1 .pep 
orf84ng 



orf 84-1 .pep 
orf 84ng 

orf 84-1 .pep 
orf 84ng 



10 20 30 40 50 60 

MAEICLITGTPGSGKTLKMVSMMANDEMFKPDENGIRRKVFTNIKGLKIPHTYIETDAKK 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : 1 I I I I I I I I I I II I I I : I I I I I I I 
MAEICLITGTPGSGKTLKMVSMMANDEMFKPDENGVRRKVFTNIKGLKIPHTHIETDAKK 
10 20 30 40 50 60 

70 80 90 100 110 120 

LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 

I I I I I I ] I I I I I I I I I I I ! I I I I : h I I I I I I ! I I I I II I I II I I I I I I I I I I I I I I I I I 
L PKST DEQLS AHDMYEW I KKPENVGAI VI VDEAQDVW PARS AGSKI PENVQWLNTHRHQG 
70 80 90 100 110 120 

130 140 150 160 170 180 

IDIFVLTQGPKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 

I | | I I I I I II I II I I I I I I II :: I I I I I : I I II : I I I I I I I : I I I I I I I I I I I I I I I I I I 
IDIFVLTQGPKLLDQNLRTLVKRHYHIAANKMGLRTLLEWKVCADDPVKMASSAFSSIYT 
130 140 150 160 170 180 

190 200 210 220 230 240 

LDKKVYDLYESAEVHTVNKVKRSKWFYTLPVIVLLIPVFVGLSYKMLSSYGKKQEEPAAQ 

I I I I II I I I I I M : I I I I I I I I I I I I I : I I I I : I 11 I : I I II I I I I I : I I I I I M I I I I I 
LDKKVYDLYESAEIHTVNKVKRSKWFYALPVIILLIPLFVGLSYKMLGSYGKKQEEPAAQ 
190 200 210 220 230 240 

250 260 270 280 290 300 

ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCI 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 
ESAATEQQAVLPDKTEGESVNNGNLTADMFVPTLPEKPESKPIYNGVRQVRTFEYIAGCI 

250 260 270 280 290 300 

310 320 330 340 350 360 

EGGRTGCACYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 

I I II I II : I I I I I I I I I I I I I M I I I M I I I I I I I I I I I I I I M I I I M M I I M I I I I I 
EGGRTGCTCYSHQGTALKEVTELMCKDYVICNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 

310 320 330 340 350 360 

370 380 390 

ATLGGKPXQNLMYDNWEERGKPFEGIGGGWGSANX 

M I I I I I I 1 I I I I I I I I I I M I 1 I I I M I I I I I II 
ATLGGKPQQNLMYDNWEERGKPFEGIGGGVVGSANX 
370 380 390 



45 



Based on this analysis, includng the presence of a putative transmembrane domain (single- 
underlined) in the gonococcal protein, and a putative ATP/GTP-binding site motif A (P-loop, 
double-underlined), it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 39 

The following partial DNA sequence was identified in N .meningitidis <SEQ ID 327>: 



50 



55 



60 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 



GTGGTTTTCC 
TGAAGTCAAA 
CGCGTGATTT 
AAACTCGAGC 
CACGATTTAT 
AGGCGTGGAA 
ACATCCATAC 
TGAGTTCGAT 
CGGAACGGGA 
ACTCAGGAAG 
TATCCGTGAT 
CGGTTTTGCA 



TGAATGCCGA 
CTGAAAAAAT 
CGCCAGCGAT 
GCACCATCCG 
CAGGCGAGTT 
TTTGGGTGAT 
ACCAGTTTCC 
CAGTTCACTT 
AAAAAGCCTG 
GTCACAAATA 
GCGCCAGGCC 
GGAACAGGAT 



CAACGGGATA 
TCCATATCGA 
ATTGAAGTGA 
CGTGAACCAT 
TTGCCGACGG 
GCTTCGCGCG 
GTTGGAAATT 
CTATGAATGT 
AAATCCACGC 
CACCAAT . . . 
AGGCGGTCGA 
TATTTTTGGA 



TTGGTTCAGG 
TTTTTACAAT 
CGGACAAGGC 
CCTTTGACCT 
CGGTTCGGAT 
AGCCTGTCGT 
GGCAAACACA 
GGAGGACATG 
TGCCCGATGT 



ATATAAAAAC 
TTACCGGCAC 



ACTTGCCTTT 
ACGGGTATGC 
AACCGGTGAG 
TGCACGGCAT 
TTGACATTCA 
GTTGAAGGCA 
AATATCGTCT 
AGCGAGGGCG 
CCGCGCCGTT 

TACCG 

TATATGCTGC 
GCGCAGCGC. 
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10 



601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 



TTGCAGCAGC 
AGCGGACACC 
GCAAACGTCT 
GAACAATTCA 
AGGCTATTTG 
AGCAGGATAA 
AACGCTGCTT 
GCAGGATGAA 
CGGGTTTGAC 
TCCGAGGTGC 
TTTGGTCTAT 



AATACCGCTG 
TTTATGGCAT 
. GTTGCCGAC 
TGCTGGCTGC 
GGATTGGACG 
GATGCAGGGC 
TGGATGAAAC 
GCGCGGAATC 
CGAATATCCC 
GTTCGTCGGG 
CTC. . . 



GCTGCGTATC 
TGCGTGAGTT 
GCAACCAAAG 
GGAAAACACG 
AATTTATTAC 
TATTTCTACG 
CAT.ACCCGG 
GTTTCCTGCT 
GCGCCTATGC 
TTTGCAGATG 



CCCTTGGACA 
TTTGAAAGAT 
GCGCACCTGC 
CTGAACATCT 
GTCCAATATC 
AAATGCTTTA 
TACGGCTTGC 
GCACAGTATG 
TGCTGCAACT 
ACCCGTTCCC 



AGCAGTTGAA 
GGGGAAGGGC 
CGAAATCCGC 
TTGCACAAAA 
CCGAAAGAGC 
CGGCGTGATG 
CCGAATGGCA 
GATGCGTACA 
TGATGGGTTT 
C.GGTCCGCT 



This corresponds to the amino acid sequence <SEQ ID 328; ORF88>: 



15 



20 



1 MVFLNADNGI 

51 KLERTIRVNH 

101 TSIHQFPLEI 

151 TQEGHKYTNX 

201 LQQQYRWLRI 

251 EQFMLAAENT 

301 NAALDETXTR 

351 SEVRSSGLQM 



LVQDLPFEVK 
PLTLHGITIY 
GKHKYRLEFD 
XXXXXYRIRD 
PLDKQLKADT 
LNIFAQKGYL 
YGLPEWQQDE 
TRSXGPLLVY 



LKKFHIDFYN 
QASFADGGSD 
QFTSMNVEDM 
APGQAVEYKN 
FMALREFLKD 
GLDEFITSNI 
ARNRFLLHSM 
L. . . 



TGMPRDFASD 
LTFKAWNLGD 
SEGAEREKSL 
YMLPVLQEQD 
GEGRKRXVAD 
PKEQQDKMQG 
DAYTGLTEYP 



IEVTDKATGE 
ASREPWLKA 
KSTLPDVRAV 
YFWITGTRSX 
ATKGAPAEIR 
YFYEMLYGVM 
APMLLQLDGF 



Further work revealed the complete nucleotide sequence <SEQ ID 329>: 



25 



30 



35 



40 



45 



50 



55 



60 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 



ATGAGTAAAT 
TTTTTTCAGC 
TTGCATCGGT 
TATTTGGTCA 
ACTGTATGAC 
TGGTGGTTTC 
CGCGAAATGA 
GATGCGCCAT 
AACGTTATCT 
GACGGGTCGG 
CTATATCTTT 
TAGACAGTAA 
CCGGACAATC 
GGGTGCGTCC 
AGAGTGCGGA 
GACTTGCCTT 
TACGGGTATG 
CAACCGGTGA 
TTGCACGGCA 
TTTGACATTC 
TGTTGAAGGC 
AAATATCGTC 
GAGCGAGGGC 
TCCGCGCCGT 
ATTGTTTACC 
CTATATGCTG 
CGCGCAGCGG 
AAGCAGTTGA 
TGGGGAAGGG 
CCGAAATCCG 
TTTGCACAAA 
CCCGAAAGAG 
ACGGCGTGAT 
CCCGAATGGC 
GGATGCGTAC 
TTGATGGGTT 
CCGGGTGCGC 
GGTATTGATG 
ACGGCAAAAT 
CAGAAGGAAT 
CTTGAATCAT 



CCCGTAGATC 
TCCATGCGCT 
TATCGGTACG 
AATTCGGATC 
GTCTATGCTT 
TACCAGTTTG 
AGTCTTTTCG 
TCTTCGCTGT 
GGAAGTACAA 
TTCTGATTGC 
GCCCATGTTG 
CCTGCTGTTG 
AGGCGGTTTA 
AATCTCTCAT 
TGTGGTTTTC 
TTGAAGTCAA 
CCGCGTGATT 
GAAACTCGAG 
TCACGATTTA 
AAGGCGTGGA 
AACATCCATA 
TTGAGTTCGA 
GCGGAACGGG 
TACTCAGGAA 
GTATCCGTGA 
CCGGTTTTGC 
CTTGCAGCAG 
AAGCGGACAC 
CGCAAACGTC 
CGAACAATTC 
AAGGCTATTT 
CAGCAGGATA 
GAACGCTGCT 
AGCAGGATGA 
ACGGGTTTGA 
TTCCGAGGTG 
TTTTGGTCTA 
TTTTATGTGC 
CCGTTTTGCC 
TTCCAAAACA 
GACTGA 



TCCCCCACTT 
TTGCAGTCGC 
GTGTTGCAGC 
GTTTTGGGCG 
CGGCATGGTT 
TGCCTGATTC 
GGAAAAGGTT 
TGGATGTAAA 
GGTTTTCAGG 
CGCCAAAAAA 
CTTTGATTGT 
AAACTGGGTA 
TGCCAAGGAT 
TTAGGGGCAA 
CTGAATGCCG 
ACTGAAAAAA 
TCGCCAGCGA 
CGCACCATCC 
TCAGGCGAGT 
ATTTGGGTGA 
CACCAGTTTC 
TCAGTTCACT 
AAAAAAGCCT 
GGTAAAAAAT 
TGCGGCAGGG 
AGGAACAGGA 
CAATACCGCT 
CTTTATGGCA 
TGGTTGCCGA 
ATGCTGGCTG 
GGGATTGGAC 
AGATGCAGGG 
TTGGATGAAA 
AGCGCGGAAT 
CCGAATATCC 
CGTTCGTCGG 
TCTCGGCTCG 
GCGAAAAACG 
ATGTCTTCGG 
CGTCGAGAGT 



CTTTCCCGTC 
TTTGCTCAGT 
AAAACCAGCC 
CAGATTTTTG 
TGTCGTTATC 
GCAATGTGCC 
AAAGAAAAAT 
AATTGCGCCC 
GAAAAACCAT 
GGCACAATGA 
CATTTGCCTG 
TGCTGACCGG 
TTCAAGCCCG 
CGTCAATATT 
ACAACGGGAT 
TTCCATATCG 
TATTGAAGTG 
GCGTGAACCA 
TTTGCCGACG 
TGCTTCGCGC 
CGTTGGAAAT 
TCTATGAATG 
GAAATCCACG 
ACACCAATAT 
CAGGCGGTCG 
TTATTTTTGG 
GGCTGCGTAT 
TTGCGTGAGT 
CGCAACCAAA 
CGGAAAACAC 
GAATTTATTA 
CTATTTCTAC 
CCATACGCCG 
CGTTTCCTGC 
CGCGCCTATG 
GTTTGCAGAT 
GTGCTGTTGG 
GGCGTGGGTA 
CCCGCAGCGA 
CTGCAACGGC 



CGTGGTTCGC 
CTGCTGGGTA 
GCAGACGGAT 
GTTTTCTGGG 
ATGATGTTTT 
GCCGTTCTGG 
CTCTGGCGGC 
GAGGTTGCCA 
TAACCGTGAA 
ACAAATGGGG 
GGCGGGTTGA 
TCGGATTGTT 
AAAGTATTTT 
TCCGAGGGGC 
ATTGGTTCAG 
ATTTTTACAA 
ACGGACAAGG 
TCCTTTGACC 
GCGGTTCGGA 
GAGCCTGTCG 
TGGCAAACAC 
TGGAGGACAT 
CTGAACGATG 
CGGCCCTTCC 
AATATAAAAA 
ATTACCGGCA 
CCCCTTGGAC 
TTTTGAAAGA 
GGCGCACCTG 
GCTGAACATC 
CGTCCAATAT 
GAAATGCTTT 
GTACGGCTTG 
TGCACAGTAT 
CTGCTGCAAC 
GACCCGTTCC 
TATTGGGTAC 
TTGTTTTCAG 
ACGGGATTTG 
TCGGCAAGGA 



65 



This corresponds to the amino acid sequence <SEQ ID 330; ORF88-l>: 

1 MSKSRRSPPL LSRPWFAFFS SMRF AVALLS LLGIASVIGT VL QQNQPQTD 
51 YLVKFGSFWA QIFGFLGLYD VYASAWFWI MMFLWSTSL CLIRNVPPFW 
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101 REMKSFREKV KEKSLAAMRH SSLLDVKIAP EVAKRYLEVQ GFQGKTINRE 

151 DGSVLIAAKK GTMNKWG YIF AHVALIVICL GGLI DSNLLL KLGMLTGRIV 

201 PDNQAVYAKD FKPESILGAS NLSFRGNVNI SEGQSADWF LNADNGILVQ 

251 DLPFEVKLKK FHIDFYNTGM PRDFASDIEV TDKATGEKLE RTIRVNHPLT 

301 LHGITIYQAS FADGGSDLTF KAWNLGDASR EPWLKATSI HQFPLEIGKH 

351 KYRLEFDQFT SMNVEDMSEG AEREKSLKST LNDVRAVTQE GKKYTNIGPS 

401 IVYRIRDAAG QAVEYKNYML PVLQEQDYFW ITGTRSGLQQ QYRWLRIPLD 

451 KQLKADTFMA LREFLKDGEG RKRLVADATK GAPAEIREQF MLAAENTLNI 

501 FAQKGYLGLD EFITSNIPKE QQDKMQGYFY EMLYGVMNAA LDETIRRYGL 

551 PEWQQDEARN RFLLHSMDAY TGLTEYPAPM LLQLDGFSEV RSSGLQMTRS 

601 PG ALLVYLGS VLLVLGTVLM FYVREKRAWV LFSDGKIRFA MSSARSERDL 

651 QKEFPKHVES LQRLGKDLNH D* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF88 shows 95.7% identity over a 371aa overlap with an ORF (ORF88a) from strain A ofN. 



meningitidis: 



10 20 30 

orf 88 .pep MVFLNADNGILVQDLPFEVKLKKFHIDFYN 

: I I I I I I I I I I I I I t I I I I I I I I I I I i I I I 
orf88a AKDFKPESILGASNLSFRGNVNISEGQSADWFLNADNGILVQDLPFEVKLKKFHIDFYN 
210 220 230 240 250 260 

40 50 60 70 80 90 

orf88.pep TGMPRDFASDIEVTDKATGEKLERTIRVNHPLTLHGITIYQASFADGGSDLTFKAWNLGD 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
orf88a TGMPRDFASDIEVTDKATGEKLERTIRVNHPLTLHGITIYQASFADGGSDLTFKAWNLGD 

270 280 290 300 310 320 

100 110 120 130 140 150 

orf88.pep ASREPWLKATSIHQFPLEIGKHKYRLEFDQFTSMNVEDMSEGAEREKSLKSTLPDVRAV 

1 M I I I I ! I I I ! I i I I I I M M I I I I I I M M I II I I I I I I I I I I I II I I I M I Mill 
orf88a ASREPWLKATSIHQFPLEIGKHKYRLEFDQFTSMNVEDMSEGAEREKSLKSTLNDVRAV 

330 340 350 ' 360 370 380 

160 170 180 190 200 210 

orf 88 . pep TQEGHKYTNXXXXXXYRIRDAPGQAVEYKNYML PVLQEQDYFW I TGTRSXLQQQYRWLRI 

1111:1111 I I I I I I I I I I I I I I I I II I I I I I I I I I I I I M I I I I I I I I I I I 

orf 88a TQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYMLPVLQEQDYFWITGTRSGLQQQYRWLRI 

390 400 410 420 430 440 

220 230 240 250 260 270 

orf 88. pep PLDKQLKADTFMALREFLKDGEGRKRXVADATKGAPAEIREQFMLAAENTLN I FAQKGYL 

I I I I I I I I I I I I I M I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 88a PLDKQLKADTFMALREFLKDGEGRKRLVADATKGAPAEIREQFMLAAENTLNIFAQKGYL 

450 460 470 480 490 500 

280 290 300 310 320 330 

orf 88 . pep GLDEFITSNIPKEQQDKMQGYFYEMLYGVMNAALDETXTRYGLPEWQQDEARNRFLLHSM 

I I I I I I I I 1 I I I I I I I I II I I I I I I I I II I I I II I I I I I II I I I I I I I I I I I I I I M I 
orf 88a GLDEFITSNIPKEQQDKMQGYFYEMLYGVMNAALDETIRRYGLPEWQQDEARNRFLLHSM 
510 520 530 540 550 560 



340 350 360 370 

orf 88 . pep DAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRSXGP LLVYL 

I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 
orf 88a DAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRSPGA LLVYLGSVLLVLGTVLM FYVREKR 
570 580 590 600 610 620 



orf 88a AWVLFSDGKIRFAMSSARSERDLQKEFPKHVESLQRLGKDLNHDX 
630 640 650 660 670 

The complete length ORF88a nucleotide sequence <SEQ ID 33 1> is: 

1 ATGAGTAAAT CCCGTAGATC TCCCCCACTT CTTTCCCGTC CGTGGTTCGC 
51 TTTTTTCAGC TCCATGCGCT TTGCGGTCGC TTTGCTCAGT CTGCTGGGTA 
101 TTGCATCGGT TATCGGTACG GTGTTGCAGC AAAACCAGCC GCAGACGGAT 
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151 


m t\ fir mm /■» m /*> 7\ 

TATTTGGTCA 


201 


ACTGTATGAC 


251 


TGGTGGTTTC 


301 


CGCGAAATGA 


351 


^ tv m /^i f*% tv fTA 

GATGCGCCAT 


401 


AACGTTATCT 


451 


GACGGGTCGG 


501 


CTATATCTTT 


551 


TAGACAGTAA 


601 


CCGGACAATC 


651 


GGGTGCGTCC 


701 


TV ^ TV TV 

AGAGTGCGGA 


751 


GACTTGCCTT 


801 


TACGGGTATG 


851 


CAACCGGTGA 


901 


TTGCACGGCA 


951 


TTTGACATTC 


1001 


TGTTGAAGGC 


1051 


AAAT AT CGT C 


1101 


GAGCGAGGGC 


1151 


TCCGCGCCGT 


1201 


ATTGTTTACC 


1251 


CTATATGCTG 


1301 


CGCGCAGCGG 


1351 


tv T\ /*"^ /"^ t\ mm tv 

AAGCAGTTGA 


1401 


TGGGGAAGGG 


1451 


CCGAAATCCG 


1501 


mmrnnn% /i« tv tv 

TTTGCACAAA 


1551 


/-\ TV TV TV TV r"» 

CCCGAAAGAG 


1601 


t% ^ r+r*r* m tv m 

ACGGCGTGAT 


1651 


CCCGAATGGC 


1701 


GGATGCGTAC 


1751 


TTGATGGGTT 


1801 


CCGGGTGCGC 


1851 


GGTATTGATG 


1901 


ACGGCAAAAT 


1951 


CAGAAGGAAT 


2001 


CTTGAATCAT 



AATTCGGATC 
GTCTATGCTT 
TACCAGTTTG 
AGTCTTTTCG 
TCTTCGCTGT 
GGAAGTACAA 
TTCTGATTGC 
GCCCATGTTG 
CCTGCTGTTG 
AGGCGGTTTA 
AATCTCTCAT 
TGTGGTTTTC 
TTGAAGTCAA 
CCGCGCGATT 
GAAACTCGAG 
TCACGATTTA 
AAGGCGTGGA 
AACATCCATA 
TTGAGTTCGA 
GCGGAACGGG 
TACTCAGGAA 
GTATCCGTGA 
CCGGTTTTGC 
CTTGCAGCAG 
AAGCGGACAC 
CGCAAACGTC 
CGAACAATTC 
AAGGCTATTT 
CAGCAGGATA 
GAACGCTGCT 
AGCAGGATGA 
ACGGGTTTGA 
TTCCGAGGTG 
TTTTGGTCTA 
TTTTATGTGC 
CCGTTTTGCC 
TTCCAAAACA 
GACTGA 



GTTTTGGGCG 
CGGCATGGTT 
TGCCTGATTC 
GGAAAAGGTT 
TGGATGTAAA 
GGTTTTCAGG 
CGCCAAAAAA 
CTTTGATTGT 
AAACTGGGTA 
TGCCAAGGAT 
TTAGGGGCAA 
CTGAATGCCG 
ACTGAAAAAA 
TTGCCAGTGA 
CGCACCATCC 
TCAGGCGAGT 
ATTTGGGTGA 
CACCAGTTTC 
TCAGTTTACT 
AAAAAAGCCT 
GGTAAAAAAT 
TGCGGCAGGG 
AGGAACAGGA 
CAATACCGCT 
CTTTATGGCA 
TGGTTGCCGA 
ATGCTGGCTG 
GGGATTGGAC 
AGATGCAGGG 
TTGGATGAAA 
AGCGCGGAAT 
CCGAATATCC 
CGTTCGTCGG 
TCTCGGCTCG 
GCGAAAAACG 
ATGTCTTCGG 
CGTCGAGAGT 



CAGATTTTTG 
TGTCGTTATC 
GCAATGTGCC 
AAAGAAAAAT 
AATTGCGCCC 
GAAAAACCAT 
GGCACAATGA 
CATTTGCCTG 
TGCTGACCGG 
TTCAAGCCCG 
CGTCAATATT 
ACAACGGGAT 
TTCCATATCG 
TATTGAAGTA 
GCGTGAACCA 
TTTGCCGACG 
TGCTTCGCGC 
CGTTGGAAAT 
TCTATGAATG 
GAAATCCACG 
ACACCAATAT 
CAGGCGGTCG 
TTATTTTTGG 
GGCTGCGTAT 
TTGCGTGAGT 
CGCAACCAAA 
CGGAAAACAC 
GAATTTATTA 
CTATTTCTAC 
CCATACGCCG 
CGTTTCCTGC 
CGCGCCTATG 
GTTTGCAGAT 
GTGCTGTTGG 
GGCGTGGGTA 
CCCGCAGCGA 
CTGCAACGGC 



GTTTTCTGGG 
ATGATGTTTT 
GCCGTTCTGG 
CTCTGGCGGC 
GAGGTTGCCA 
TAACCGTGAA 
ACAAATGGGG 
GGCGGGTTGA 
TCGGATTGTT 
AAAGTATTTT 
TCCGAGGGGC 
ATTGGTTCAG 
ATTTTTACAA 
ACGGATAAGG 
TCCTTTGACC 
GCGGTTCGGA 
GAGCCTGTCG 
TGGCAAACAC 
TGGAGGACAT 
CTGAACGATG 
CGGCCCTTCC 
AATATAAAAA 
ATTACCGGCA 
CCCCTTGGAC 
TTTTGAAAGA 
GGCGCACCTG 
GCTGAACATC 
CGTCCAATAT 
GAAATGCTTT 
GTACGGCTTG 
TGCACAGTAT 
CTGCTGCAAC 
GACCCGTTCC 
TATTGGGTAC 
TTGTTTTCAG 
ACGGGATTTG 
TCGGCAAGGA 



This encodes a protein having amino acid sequence <SEQ ID 332>: 



1 MSKSRRSPPL LSRPWFAFFS SMRF AVALLS LLGIASVIGT VL QQNQPQTD 

51 YLVKFGSFWA QIFGFLGLYD VYASAW FWI MMFLWSTSL CLI RNVPPFW 

101 REMKSFREKV KEKSLAAMRH SSLLDVKIAP EVAKRYLEVQ GFQGKTINRE 

151 DGSVLIAAKK GTMNKWG YIF AHVALIVICL GGLI DSNLLL KLGMLTGRIV 

201 PDNQAVYAKD FKPESILGAS NLSFRGNVNI SEGQSADWF LNADNGILVQ 

251 DLPFEVKLKK FHIDFYNTGM PRDFASDIEV TDKATGEKLE RTIRVNHPLT 

301 LHGITIYQAS FADGGSDLTF KAWNLGDASR EPWLKATSI HQFPLEIGKH 

351 KYRLEFDQFT SMNVEDMSEG AEREKSLKST LNDVRAVTQE GKKYTNIGPS 

401 IVYRIRDAAG QAVEYKNYML PVLQEQDYFW ITGTRSGLQQ QYRWLRIPLD 

451 KQLKADTFMA LREFLKDGEG RKRLVADATK GAPAEIREQF MLAAENTLNI 

501 FAQKGYLGLD EFITSNIPKE QQDKMQGYFY EMLYGVMNAA LDETIRRYGL 

551 PEWQQDEARN RFLLHSMDAY TGLTEYPAPM LLQLDGFSEV RSSGLQMTRS 

601 PG ALLVYLGS VLLVLGTVLM FYVREKRAWV LFSDGKIRFA MSSARSERDL 

651 QKEFPKHVES LQRLGKDLNH D* 

ORF88a and ORF88-1 100.0% identity in 671 aa overlap: 



orf88a.pep MSKSRRSPPLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGSFWA 60 

I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I 
orf88-l MSKSRRSPPLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGSFWA 60 

orf 88a .pep QIFGFLGLYDVYASAWFWIMMFLVVSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 120 

I I I I I I I I I I I I I I I I I I I t I I I I I 1 I I 1 I I I I I t I I M I I I I I I I I I I I I I I I I I I I I I 
orf 88-1 QIFGFLGLYDVYASAWFWIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 120 

orf 88a , pep SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 180 

I I I I I I I I I I I I I I I I I I I I I I I I I M I I I II M I I I I I I I I I I I M I I I II I I I I I I I I 
orf 88-1 SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 180 



orf88a.pep 



GGLIDSNLLLKLGMLTGRIVPDNQAVYAKDFKPESILGASNLS FRGNVN I SEGQSADWF 240 
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10 



15 



20 



25 



30 



35 



orf88-l 

orf 88a. pep 

orf88-l 

orf88a.pep 

orf88-l 

orf88a.pep 

orf88-l 

orf 88a. pep 

orf88-l 

orf 88a. pep 

orf88-l 

orf 88a. pep 

orf88-l 

orf 88a. pep 

orf88-l 

orf 88a. pep 

orf88-l 



I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I t I I I I I I I I I I I I I 
GGLI DSNLLLKLGMLTGRI VPDNQAVYAKDFKPES I LGASNLS FRGNVNI SEGQSADWF 

LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 
LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 



240 



300 



300 

LHGITIYQASFADGGSDLTFKAWNLGDASREPVVLKATSIHQFPLEIGKHKYRLEFDQFT 360 

I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
LHG I T I YQAS FADGG S DLT FKAWNLG DAS RE P WLKATS I HQFP LE I GKHKYRLE FDQFT 360 

SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 420 

I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 
SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 420 

PVLQEQDYFWITGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVADATK 480 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
PVLQEQDYFWITGTRSGLQQQYRWLRI PLDKQLKADT FMALRE FLKDGEGRKRLVADATK 480 

GAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKEQQDKMQGYFYEMLYGVMNAA 540 

I I I I I I I I I II I I I I I I II II I I I I I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I II I 

G APAE I REQFMLAAENT LN I FAQKGYLGLDE FITSNI PKEQQDKMQG Y F YEMLYGVMNAA 540 



LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 
I I I I I I ! I I I I I I I I I I I I I II I I i I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I II I 
LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 



600 



600 



660 



PGALLVYLGSVLLVLGTVLMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 
I M I I I I I I I I I II I I I I I M I I I t I I I I I I I I I I I I I I I I I t I I I 1 M ! I I M M I I I I 
PGALLVYLGSVLLVLGTVLMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 660 



LQRLGKDLNHD 

I I I I I I I I I I I 
LQRLGKDLNHD 



672 
672 



Homology with a predicted ORF from N. gonorrhoeae 

ORF88 shows 93.8% identity over a 371aa overlap with a predicted ORF (ORF88.ng) from N. 
gonorrhoeae: 



40 



45 



50 



55 



60 
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orf 88 .pep 
orf 88ng 
orf 88. pep 
orf88ng 
orf 88. pep 
orf88ng 
orf 88 .pep 
orf88ng 
orf 88. pep 
orf88ng 
orf 88 .pep 
orf88ng 
orf 88 .pep 
orf88ng 



MVFLNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNH 60 
I I I I I I I I I : I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I i I I I I I I I I I I II I I I I I 

MVFLNADNGMLVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNH 60 

PLT LHG IT I YQAS FADGG S DLT FKAWNLG DAS RE PWLKAT SI HQ FPLE I GKHKYRLE FD 120 
I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 

PLTLHGIT I YQAS FADGG S DLT FKAWNLRDASRE PWLKAT SIHQFPLEIGKHKYRLEFD 120 



QFTSMNVEDMSEGAEREKSLKSTLPDVRAVTQEGHKYTNXXXXXXYRIRDAPGQAVEYKN 

I I 1 I II I I I I I I I I I I I I I I I I I I I I I I I I I I 1 : I I II I I I I I I I I I I I I I I 

QFTSMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKN 



180 



180 



240 



YMLPVLQEQDYFW I TGTRSXLQQQYRWLR I PLDKQLKADT FMALRE FLKDGEGRKRXVAD 
I 1 I I : I I : : I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I III 
YMLPILQDKDYFWLTGTRSGLQQQYRWLRI PLDKQLKADT FMALRE FLKDGEGRKRLVAD 240 



ATKGAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKEQQDKMQGYFYEMLYGVM 
III I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I II I II I I I I I i 1 I I I I I I I I I I 
ATKDAP AE IREQFMLAAENTLN I FAQKGYLGLDE FI T SN I PKGQQDKMQG YFYEML YGVM 



300 
300 



360 



NAALDETXTRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQM 
III Mil I M I I I I I I II I I I I I I II I I M I 1 I I I I I I I I I I I I II 11 I I II I II I I I 
NAALDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQM 360 

TRSXGPLLVYL 371 
III I I I I I I 

TRSPGALLVYLGSVLLVLGTVFMFYVPKKRAWVLFSNXKIRFAMSSARSERDLQKEFPKH 420 
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An ORF88ng nucleotide sequence <SEQ ED 333> was predicted to encode a protein having amino 
acid sequence <SEQ ID 334>: 

1 MVTXNADNGM LVQDLPFEVK LKKFHIDFYN TGMPRDFASD IEVTDKATGE 

51 KLERTIRVNH PLTLHGITIY QASFADGGSD LTFKAWNLRD ASREPWLKA 

101 TSIHQFPLEI GKHKYRLEFD QFTSMNVEDM SEGAEREKSL KSTLNDVRAV 

151 TQEGKKYTNI GPSIVYRIRD AAGQAVEYKN YMLPILQDKD YFWLTGTRSG 

201 LQQQYRWLRI PLDKQLKADT FMALREFLKD GEGRKRLVAD ATKDAPAEIR 

251 EQFMLAAENT LNIFAQKGYL GLDEFITSNI PKGQQDKMQG YFYEMLYGVM 

301 NAALDETIRR YGLPEWQQDE ARNRFLLHSM DAYTGLTEYP APMLLQLDGF 

351 SEVRSSGLQM TRSPG ALLVY LGSVLLVLGT VFM FYVPKKR AWVLFSNXKI 

401 RFAMSSARSE RDLQKEFPKH VESLQRLGKD LNHD* 

Further work revealed the complete gonococcal DNA sequence <SEQ ID 335>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 



ATGAGTAAAT 
TTTTTTCAGC 
TTGCATCGGT 
TATTTGGTCA 
TTTGTATGAT 
TGGTGGTTTC 
CGCGAAATGA 
GATGCGCCAT 
AACGTTATCT 
GACGGGTCGG 
CTATATCTTT 
TAGACAGTAA 
CCGGACAATC 
GGGTGCGTCC 
AAAGTGCGGA 
GACTTGCCTT 
TACGGGTATG 
CAACCGGTGA 
TTGCACGGCA 
TTTGACATTC 
TGTTGAAGGC 
AAATATCGTC 
GAGCGAGGGT 
TCCGCGCCGT 
ATCGTGTACC 
CTATATGCTG 
CGCGCAGCGG 
AAGCAGTTGA 
TGGGGAAGGG 
CCGAAATCCG 
TTTGCGCAAA 
CCCGAAAGGG 
ACGGCGTGAT 
CCCGAATGGC 
GGATGCCTAT 
TTGACGGGTT 
CCGGGTGCGC 
ggtaTttatg 
aCGGCAAAAT 
cAGAaggaaT 
CttgaaTCAT 



CCCGTATATC 
TCCATGCGCT 
TATCGGCACG 
AATTCGGACC 
GTCTATGCTT 
TACCAGTTTG 
AGTCTTTCCG 
TCTTCGCTGT 
GGAGGTGCGG 
TTCTGATTGC 
GCccaagtag 
CCTGCTGCTG 
AGGCGGTTTA 
AATCTCTCAT 
TGTGGTTTTC 
TTGAAGTCAA 
CCGCGCGATT 
GAAACTCGAG 
TCACGATTTA 
AAGGCGTGGA 
AACCTCCATA 
TTGAGTTCGA 
GCGGAACGGG 
TACTCAGGAA 
GCATCCGTGA 
CCGATTTTGC 
CTTGCAGCAG 
AAGCGGACAC 
CGCAAACGTC 
CGAACAATTC 
AAGGCTATTT 
CAGCAGGATA 
GAACGCTGCT 
AGCAGGATGA 
ACGGGGCTGA 
TTCCGAGGTG 
TTTTGGTCTA 
tTTTATGTGC 
CCGTTTTGCT 
TTCCAAAACA 
GACTga 



TCCCACACTT 
TTGCGGTCGC 
GTGTTACAGC 
GTTTTGGACT 
CGGCATGGTT 
TGTTTAATCC 
GGAAAAGGTT 
TGGATGTAAA 
GGTTTTCAGG 
CGCCAAAAAA 
ctTTGATTGT 
AAGCTGGGTA 
TGCCAAGGAT 
TTAGGGGCAA 
CTGAATGCCG 
ACTGAAAAAA 
TTGCCAGCGA 
CGCACCATCC 
TCAGGCGAGT 
ATTTGAGGGA 
CACCAGTTTC 
TCAGTTCACT 
AAAAAAGCCT 
GGTAAAAAAT 
TGcggCAGGG 
AGGACAAAGA 
CAATACCGCT 
CTTTATGGCA 
TGGTTGCCGA 
ATGCTGGCTG 
GGGATTGGAC 
AGATGCAGGG 
TTGGATGAAA 
AGCGCGGAAC 
CGGAATATCC 
CGTTCCTCAG 
TCtcggctcg 
GCGAAAAACG 
ATGtCTTcgg 
CGtcgAGAGC 



CTTTCCCGTC 
TTTGCTCAGT 
AAAACCAGCC 
CGGATTTTTG 
TGTCGTTATC 
GTAACGTTCC 
AAAGAAAAAT 
AATTGCCCCC 
GAAAAACCGT 
GGCAcaatga 
CATTTGCCTG 
TGCTGGCCGG 
TTCAAGCCCG 
CGTCAATATT 
ACAACGGGAT 
TTCCATATCG 
TATTGAAGTA 
GCGTGAACCA 
TTTGCCGACG 
TGCTTCGCGC 
CGTTGGAAAT 
TCTATGAATG 
GAAATCCACT 
ACACCAATAT 
CAGGCGGTCG 
TTATTTTTGG 
GGCTGCGTAT 
TTGCGTGAGT 
CGCAACCAAA 
CGGAAAACAC 
GAATTTATTA 
CTATTTCTAC 
CCATACGCCG 
CGTTTCCTGC 
CGCGCCTATG 
GTTTGCAGAT 
gtattgttgg 
GGCGTGGgta 
CCcgcagcga 
CTGCAACggc 



CGTGGTTCGC 
CTGCTGGGTA 
GCAGACGGAT 
ATTTTTTGGG 
ATGATGTTTC 
GCCGTTTTGG 
CTCTGGCGGC 
GAAGTTGCCA 
CAGCCGTGAG 
acaaATGGGG 
GGCGGGTTGA 
TCGGATTGTT 
AAAGTATTTT 
TCCGAGGGGC 
GTTGGTTCAG 
ATTTTTACAA 
ACGGACAAGG 
TCCTTTGACC 
GCGGTTCGGA 
GAACCTGTCG 
CGGCAAACAC 
TGGAGGACAT 
CTGAACGATG 
CGGCCCTTCC 
AATATAAAAA 
CTGACCGGCA 
CCCCTTGGAC 
TTTTGAAAGA 
GACGCACCTG 
GCTGAATATC 
CGTCCAATAT 
GAAATGCTTT 
GTACGGCTTG 
TGCACAGTAT 
CTGCTCCAGC 
GACCCGTTCG 
TTTTGGgtac 
tTGTTTTCag 
ACGGGATTTG 
tcggcaaggA 



This corresponds to the amino acid sequence <SEQ ID 336; ORF88ng-l>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



MSKSRISPTL 
YLVKFGPFWT 
REMKSFREKV 
DGSVLIAAKK 
PDNQAVYAKD 
DLPFEVKLKK 
LHGITIYQAS 
KYRLEFDQFT 
IVYRIRDAAG 
KQLKADTFMA 
FAQKGYLGLD 



LSRPWFAFFS 
RIFDFLGLYD 
KEKSLAAMRH 
GTMNKWG YIF 
FKPESILGAS 
FHIDFYNTGM 
FADGGSDLTF 
SMNVEDMSEG 
QAVEYKNYML 
LREFLKDGEG 
EFITSNIPKG 



SMRF AVALLS LLGIASVIGT 
VYASAWFWI MMFLWSTSL 



SSLLDVKIAP 
AQVALIVICL 



EVAKRYLEVR 
GGLIDSNLLL 



NLSFRGNVNI 
PRDFASDIEV 
KAWNLRDASR 
AEREKSLKST 
PILQDKDYFW 
RKRLVADATK 
QQDKMQGYFY 



SEGQSADVVT 
TDKATGEKLE 
EPWLKATSI 
LNDVRAVTQE 
LTGTRSGLQQ 
DAPAEIREQF 
EMLYGVMNAA 



VLQQNQPQTD 
CLI RNVPPFW 
GFQGKTVSRE 
KLGMLAGRIV 
LNADNGMLVQ 
RTIRVNHPLT 
HQFPLEIGKH 
GKKYTNIGPS 
QYRWLRI PLD 
MLAAENTLNI 
LDETIRRYGL 
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551 PEWQQDEARN RFLLHSMDAY TGLTEYPAPM LLQLDGFSEV RSSGLQMTRS 
601 PG ALLVYLGS VLLVLGTVFM FYVREKRAWV LFSDGKIRFA MSSARSERDL 
651 QKEFPKHVES LQRLGKDLNH D* 

ORF88ng-l and ORF88-1 show 97.0% identity in 671 aa overlap: 



10 



15 



20 



25 



30 



35 



40 



45 



50 



orf88-l.pep 
orf88ng-l 
orf88-l.pep 
orf88ng-l 



MSKSRRSPPLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGSFWA 60 

I M I I II I I I I I I I I I I I I I I I I I II I I I I I I ! I I I I I I I I I II I I I I I I I I II II: 
MSKSRISPTLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGPFWT 60 

QIFGFLGLYDVYASAWFVVIMMFLVVSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 120 

:|| | I I I I f I I I I I i I E I I 1 I I I I I I t I I I I I I f I I I I I I I 1 I I I I I I i I I L I J K I 1 I I 
RIFDFLGLYDVYASAWFWIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 120 



orf 88-1 .pep SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 180 

Ml I I M I I Mil I llllhlll I M::M I III I II II II I I IIIM I I 1:11 11)111 
orf88ng-l SSLLDVKIAPEVAKRYLEVRGFQGKTVSREDGSVLIAAKKGTMNKWGYIFAQVALIVICL 180 



orf88-l.pep 

orf88ng-l 

orf88-l.pep 

orf88ng-l 

orf 88-1. pep 

orf 88ng-l 

orf88-l.pep 

orf88ng-l 

orf 88-1. pep 

orf88ng-l 

orf88-l.pep 

orf88ng-l 



orf88-l. 



GGLIDSNLLLKLGMLTGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNISEGQSADWF 240 

I I I I I I I I I I I I I I I: I I I I I I I I I I 1 1 I I I I II I I I I I I I I I I II I I II I I I I I I I I 1 1 
GGLI DSNLLLKLGMLAGRI VPDNQAVYAKDFKPE S I LGASNLS FRGNVN I SEGQSADWF 240 

LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 300 

| | | | | |: I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
LNADNGMLVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 300 

LHGITIYQASFADGGSDLTFKAWNLGDASREPWLKATSIHQFPLEIGKHKYRLEFDQFT 360 

[ I I II II I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I II I I I 
LHGITIYQASFADGGSDLTFKAWNLRDASREPWLKATSIHQFPLEIGKHKYRLEFDQFT 360 

SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 420 
Ml I I II I II II II I I MM Ml I I III II II II II I I Ml II I II IN MM I M MM 
SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 420 



PVLQEQDYFWITGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVADATK 
I : I I :: I I I I : I I I I I I I I I I I I I I I 1 I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
PILQDKDYFWLTGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVADATK 

GAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKEQQDKMQGYFYEMLYGVMNAA 

I I I I II I I I I I M I I I I I I I I I II I II II I I I I I II I I I I I I I II I I I I I I I I I I II I 
DAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKGQQDKMQGYFYEMLYGVMNAA 



.pep LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 

Ml M II I MM II III IMMMM M M III I M IIMM I Mill I II II MMMI 
orf88ng-l LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 



orf88-l.pep 
orf88ng-l 



PGALLVYLGSVLLVLGTVLMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 

I 11 I I I I I II I I I I M II : I I II I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I II I I II 
PGALLVYLGSVLLVLGTVFMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 



480 



480 



540 



540 



600 



600 



660 
660 



LQRLGKDLNHD 
I I I I I I I I M I 
LQRLGKDLNHD 



671 



orf 88-1 .pep 
orf88ng-l 

Furthermore, ORG88ng-l shows homology with a hypothetical protein from Aquifex aeolicus: 



671 



55 



60 



65 



gi | 2984296 (AE000771) hypothetical protein [Aquifex aeolicus] Length « 537 
Score - 94.4 bits (231), Expect = 2e-18 

Identities = 91/334 (27%), Positives =* 159/334 (47%), Gaps « 59/334 (17%) 

FAFFS SMRFAVALLSLLG IAS VIG-TVLQQNQPQTDYLVKFGPFWTRIFDFLGLYDVYAS 74 
+ F +S++ A+ ++ +LGI S++G T ++QNQ YL +FG L L DV+ S 

YDFLASLKLAIFIMLVLGILSMLGSTYIKQNQSFEWYLDQFGYDVGIWIWKLWLNDVFHS 139 

AWFWIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRHSSLLDVKIAPEVAK 134 
++++ ++ L V+ C 1+ +P W++ S +E++ + A +H + VKI P+ K 



Query: 


16 


Sbjct: 


80 


Query: 


75 


Sbjct: 


140 


Query: 


135 


Sbjct: 


198 



++L +GF+ V E + + A+KG ++ G 



+AL+VI G LID 



249 



WO 99/24578 PCT/IB98/01 665 

-225- 



Query: 193 GMLAGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNISEGQSADWFLNADNGMLVQDL 252 

+I+G RG++ ++EG + DV+ + A+ L 

Sbjct: 250 AIVGV RGSLIVAEGDTNDVMLVGAE — QKPYKL 280 

Query: 253 PFEVKLKKFHI DFY NTGMPRDFA SDIEVTDKATGEKLER — TIRVNHPLT 300 

PFVLFIY N+ + FA SDIE+ + G K+E T++VN P 
Sbjct: 281 PFAVHLIDFRIKTYAEENPNVDKRFAQAVSSYESDIEIIN GGKVEAKGTVKVNE PFD 337 

Query: 301 LHG I T I YQAS FA — DGGS DLTFKAWNLRDASRE P 332 

++QA++ DG S + + + A +P 

Sbjct: 338 FGRYRLFQATYGILDGTSGMGVIWDRKKAHEDP 371 

Based on this analysis, including the putative transmembrane domain in the gonococcal protein, 
it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 40 

The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 
337>: 

1 ATGATGAGTA ATAmAATGGm ACAAAAAGGG TTTACATTGA TTGmGmTGAT 

51 GATAGTCGTC GCGATACTCG GCATTATCAG CGTCATTGCC ATACCTTCTT 

101 ATCmAAGTTA TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 

151 GyCGGTATCA ACAATATTTC CAAACAGTTT ATTTTGAAAA ATCCCCTGGA 

201 CGATAATCAG ACCATCGAGA ACAAACTGGA AATATTTGTC TCAGGCTATA 

251 AGATGAATCC GAAAATTGCC AAAAAaTATA GTGTTTCGGT AAAGTTTGTC 

301 GATAAGGAAA AATCAAGGGC ATACAGGTTG GTCGGCGTTC CGAAGGCGGG 

351 GACGGGTTAT ACTTTGTCGG TATGGATGAA CAGCGTGGGC GACGGATACA 

401 AATGCCGTGA TGCCGCTTCT GCCCAAGCCC ATTTGGAGAC CTTGTCCTCA 

451 GATGTCGGCT GTGAAGCCTT CTCTAATCGT AAAAAATAA 

This corresponds to the amino acid sequence <SEQ ID 338; ORF89>: 

1 MMSNXMXQKG FTLIXXMIW AILGIISVIA IPSYXSYIEK GYQSQLYTEM 

51 XGINNISKQF ILKNPLDDNQ TIENKLEIFV SGYKMNPKIA KKYSVSVKFV 

101 DKEKSRAYRL VGVPKAGTGY TLSVWMNSVG DGYKCRDAAS AQAHLETLSS 

151 DVGCEAFSNR KK* 

Further work revealed the complete nucleotide sequence <SEQ ID 339>: 

1 ATGATGAGTA ATAAAATGGA ACAAAAAGGG TTTACATTGA TTGAGATGAT . 

51 GATAGTCGTC GCGATACTCG GCATTATCAG CGTCATTGCC ATACCTTCTT 

101 ATCAAAGTTA TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 

151 GTCGGTATCA ACAATATTTC CAAACAGTTT ATTTTGAAAA ATCCCCTGGA 

201 CGATAATCAG ACCATCGAGA ACAAACTGGA AATATTTGTC TCAGGCTATA 

251 AGATGAATCC GAAAATTGCC AAAAAATATA GTGTTTCGGT AAAGTTTGTC 

301 GATAAGGAAA AATCAAGGGC ATACAGGTTG GTCGGCGTTC CGAAGGCGGG 

351 GACGGGTTAT ACTTTGTCGG TATGGATGAA CAGCGTGGGC GACGGATACA 

4 01 AATGCCGTGA TGCCGCTTCT GCCCAAGCCC ATTTGGAGAC CTTGTCCTCA 

451 GATGTCGGCT GTGAAGCCTT CTCTAATCGT AAAAAATAA 

This corresponds to the amino acid sequence <SEQ ID 340; ORF89-l>: 

1 MMSNKMEQKG FTLIEMMIW AILGIISVIA IPSYQSYIEK GYQSQLYTEM 

51 VGINNISKQF ILKNPLDDNQ TIENKLEIFV SGYKMNPKIA KKYSVSVKFV 

101 DKEKSRAYRL VGVPKAGTGY TLSVWMNSVG DGYKCRDAAS AQAHLETLSS 

151 DVGCEAFSNR KK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with PilE of N. gonorrhoeae (accession number Z69260). 



ORF89 and PilE protein show 30% aa identity in 120a overlap: 
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orf89 8 QKGFTLIXXMIWAILGIISVIAIPSYXSYIEKGYQSQLYTEMXGINNISKQFILKNPL- 66 

QKGFTLI MIV+AI+GI++ +A+P+Y Y+ S+ G + ++L++ 

PilE 5 QKGFTLIELMIVIAIVGILAAVALPAYQDYTARAQVSEAILLAEGQKSAVTEYYLNHGIW 64 

5 orf89 67 -DDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGYTLSVW 125 

DN + +G + KI KY SV + GV K G LS+W 

PilE 65 PKDNTS AGVASS DKIKGKYVQS VTVAKGWTAEMASTGVNKE IQGKKLS LW 115 

Homology with a predicted ORF from N. meningitidis (strain A) 
10 ORF89 shows 83.3% identity over a 162aa overlap with an ORF (ORF89a) from strain A ofN. 
meningitidis: 

10 20 30 40 50 60 

orf 8 9. pep MMSNXMXQKGFTLIXXMIWAILGIISVIAIPSYXSYIEKGYQSQLYTEMXGINNISKQF 
I I I I I I I I I I I I I I II ill I I I I I I I I II I II I I I I I I I II II I 

15 orf89a MMSNKMEQKG FTLIXXXXXXAIXXXXSVIXXXXYXSYIEKGYQSQLYTEMVGINN I SKQX 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 89 . pep ILKNPLDDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 
20 I I I I I I I I I I I I :: I I I I I I i I I I I I I II I : II : I I I : 1 I I I III I I I II I : I I I I 

orf 89a ILKNPLDDNQTIKSKLEIFVSGYKMNPKIAEKYNVSVHFVNEEKPRAYSLVGVPKTGTGY 

70 80 90 100 110 120 

130 140 150 160 

25 orf 89 . pep TLSVWMNSVGDGYKCRDAASAQAHLETLSSDVGCEAFSNRKKX 

I I I I I I I I I I I I I I I I I I I I I: I I I I I I I I I I I I I I I I I I I I I 
or f 8 9a TLSVWMNSVGDGYKCRDAASARAHLETLSSDVGCEAFSNRKKX 

130 140 150 160 

The complete length ORF89a nucleotide sequence <SEQ ID 341> is: 

30 1 ATGATGAGTA ATAAAATGGA ACAAAAAGGG TTTACATTGA TTGNGANGNT 

51 NATNGNCNTC GCGATACNCN GCNTTANCAG CGTCATTNCN ATNNNTNCNT 

101 ATCNNAGTTA TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 

151 GTCGGTATCA ACAATATTTC CAAACAGTNT ATTTTGAAAA ATCCCCTGGA 

201 CGATAATCAG ACCATCAAGA GCAAACTGGA AATATTTGTC TCAGGCTATA 

35 251 AGATGAATCC GAAAATTGCC GAAAAATATA ATGTTTCGGT GCATTTTGTC 

301 AATGAGGAAA AACCNAGGGC ATACAGCTTG GTCGGCGTTC CAAAGACGGG 

351 GACGGGTTAT ACTTTGTCGG TATGGATGAA CAGCGTGGGC GACGGATACA 

401 AATGCCGTGA TGCCGCTTCT GCCCGAGCCC ATTTGGAGAC CTTGTCCTCA 

451 GATGTCGGCT GTGAAGCCTT CTCTAATCGT AAAAAATAG 

40 This encodes a protein having amino acid sequence <SEQ ID 342>: 

1 MMSNKMEQKG FTLIXXXXXX AIXXXXSVIX XXXYXSYIEK GYQSQLYTEM 

51 VGINNISKQX ILKNPLDDNQ TIKSKLEIFV SGYKMNPKIA EKYNVSVHFV 

101 NEEKPRAYSL VGVPKTGTGY TLSVWMNSVG DGYKCRDAAS ARAHLETLSS 

151 DVGCEAFSNR KK* 

45 ORF89a and ORF89-1 show 83.3% identity in 162 aa overlap: 

10 20 30 40 50 60 

orf 8 9a . pep MMSNKMEQKG FTLIXXXXXXAIXXXXSVIXXXXYXSYIEKGYQSQLYTEMVG INN I SKQX 

I I I I II I I II I II I II III I I I I I I II I I I I I I I I I I I I I I I I I 

orf 89-1 MMSNKMEQKGFTLIEMMIWAILGIISVIAIPSYQSYIEKGYQSQLYTEMVGINNISKQF 
50 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 8 9a. pep ILKNPLDDNQTIKSKLEIFVSGYKMNPKIAEKYNVSVHFVNEEKPRAYSLVGVPKTGTGY 
I I I I I I I I I I I I :: I I I I I I I M I I I I I I I : I I : I I I : I I :: I I III I I I I I I : I I I I 
55 orf 8 9-1 ILKNPLDDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 

70 80 90 100 110 120 

130 140 150 160 

orf 8 9a. pep TLSVWMNSVGDGYKCRDAASARAHLETLSSDVGCEAFSNRKKX 
60 | | | | | | | | | | | | || | | | | | | | : | | | | | | | | | | | | | | | | | | | | | 

orf 8 9-1 TLSVWMNSVGDGYKCRDAASAQAHLETLSSDVGCEAFSNRKKX 
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130 



140 



150 



160 



Homology with a predicted ORF from ^.gonorrhoeae 

ORF89 shows 84.6% identity over a 162aa overlap with a predicted ORF (ORF89.ng) from N. 



5 gonorrhoeae: 



10 



15 



orf89 

orf89ng 

orf89 

orf89ng 

orf89 

orf89ng 



MMSNXMXQKGFTLIXXMIWAILGIISVIAIPSYXSYIEKGYQSQLYTEMXGINNISKQF 

I I I I I I. Ml I I II : I I I J I I I I I I I I I I I I I I I I I I I I I I II MM: Ml 
MMSNKMEQKGFTLIEMMIWTILGIISVIAIPSYQSYIEKGYQSQLYTEMVGINNVLKQF 

ILKNPLDDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 

MM! I M : I : : M I : II I I I I 1 I I I I I I M I I I I i M I I I I I I I I I I I I I : I I I I I 
ILKNPQDDNDTLKSKLKIFVSGYKMNPKIAKKYSVSVRFVDAEKPRAYRLVGVPNAGTGY 

TLS VWMN S VGDGYKCRDAAS AQAHLETLS S DVGCEAFSNRKK 162 
MIMMIMMMIMMMM: :|||:| IMMMMI 
TLSVWMNSVGDGYKCRDATSAQAYSDTLSADSGCEAFSNRKK 162 



60 



60 



120 



120 



The complete length ORF89ng nucleotide sequence <SEQ ID 343> is: 



20 



25 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 



aTGATGAGCA 
GATAGTTGTC 
ATCAGAGTTA 
GTCGGTATCA 
CGATAATGAT 
AGATGAATCC 
gatGCGGAAA 
GACGGGTTAT 
AATGCCGTGA 
GATAGCGGCT 



ATAAAATGGA 
ACGATACTCG 
TATTGAAAAA 
ACAATGTTCT 
ACCCTCAAGA 
GAAAAttgCC 
AACCAAGGGC 
ACTTTGTCGG 
TGCCACTTCT 
GTGAAGCTTT 



ACAAAAAGGG 
GCATCATCAG 
GGCTATCAGT 
CAAACAGTTT 
GCAAACTGAA 
AAAAAATATA 
ATACAGGTTG 
TATGGATGAA 
GCCCAGGCCT 
CTCTAATCGT 



TTTACATTGA 
CGTCATTGCC 
CCCAGCTTTA 
ATTTTGAAAA 
AATATTTGTC 
GTGTTTCGGt 
GTCGGCGTTC 
CAGCGTGGGC 
ATTCGGACAC 
AAAAAATAG 



TTGAGATGAT 
ATACCTTCTT 
TACGGAGATG 
ATCCCCAGGA 
TCAGGCTATA 
aaggtttGTC 
CGAACGCGGG 
GACGGATACA 
CTTGTCCGCA 



30 



This encodes a protein having amino acid sequence <SEQ ID 344>: 

1 MMSNKMEOKG FTLIEMMIW TILGIISVI A IPSYQSYIEK GYQSQLYTEM 

51 VGINNVLKQF ILKNPQDDND TLKSKLKIFV SGYKMNPKIA KKYSVSVRFV 

101 DAEKPRAYRL VGVPNAGTGY TLSVWMNSVG DGYKCRDATS. AQAYSDTLSA 

151 DSGCEAFSNR KK* 



This gonococcal protein has a putative leader peptide (underlined) and N-terminal methylation site 
(NMePhe or type-4 pili, double-underlined). In addition, ORF89ng and ORF89-1 show 88.3% 
35 identity in 162 aa overlap: 

10 20 30 40 50 60 

orf 89-1 . pep MMSNKMEQKGFTLIEMMIWAILGIISVIAIPSYQSYIEKGYQSQLYTEMVGINNISKQF 
M M M I M M I M M I I M : M M II 11 I II I M M M I M I M M I M I II M : III 
orf89ng MMSNKMEQKGFTLIEMMIWTILGIISVIAIPSYQSYIEKGYQSQLYTEMVGINNVLKQF 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 8 9-1 . pep ILKNPLDDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 
Mill II I : M : M 1 : II II I II II I II I II II II M II I il I I I I I I I I I : I I I I I 
orf89ng ILKNPQDDNDTLKSKLKIFVSGYKMNPKIAKKYSVSVRFVDAEKPRAYRLVGVPNAGTGY 

70 80 90 100 110 120 

130 140 150 160 

orf 89-1 .pep TL S VWMN SVG DGYKCRDAASAQAHLETLSSDVGCEAFSNRKKX 
I I II I I I I I II I M I I I M i i I I: MIMI IMIMIMM 
orf89ng TLSVWMNSVGDGYKCRDATSAQAYSDTLSADSGCEAFSNRKKX 
130 140 150 160 



40 



45 



50 



Based on this analysis, including the gonococcal motifs and the homology with the known PilE 
protein, it was predicted that these proteins from N.meningitidis and K gonorrhoeae, and their 



55 



epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 
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ORF89-1 (13.6kDa) was cloned in the pGex vector and expressed in E.coli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 11A 
shows the results of affinity purification of the GST-fusion protein. Purified GST-fusion protein 
was used to immunise mice, whose sera gave a positive result in the ELISA test., confirming that 
ORF89-1 is a surface-exposed protein, and that it is a useful immunogen. 



Example 41 

The following partial DNA sequence was identified in A ^meningitidis <SEQ ED 345>: 

1 ATGAAAAAAT CCTCCCTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCGCCC CTGCCGACGC GGTAAGCCAA ATCCGTCAAA 

101 ACGCCACTCA AGTATTGAGC ATCTTAAAAA ACGGCGATGC CAACACCGCT 

151 CGCCAAAAAG CCGAAGCCTA TGCGATTCCC TATTTCGATT TCCAACGTAT 

201 GACCGCATTG GCGGTCGGCA ACCCTTGGsG CACCG.GTCC GACG.GCAAA 

251 AACAAGCGTT GGCCn.AGAA TTTCAACCC. . . 

This corresponds to the amino acid sequence <SEQ ID 346; ORF91>; 

1 MKKSSLISAL GIGILSIGMA FAAPADAVSQ IRQNATQVLS ILKNGDANTA 
51 RQKAEAYAIP YFDFQRMTAL AVGNPWXTXS DXQKQALAXE FQP. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 347>: 

1 ATGAAAAAAT CCTCCCTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCGCCC CTGCCGACGC GGTAAGCCAA ATCCGTCAAA 

101 ACGCCACTCA AGTATTGAGC ATCTTAAAAA ACGGCGATGC CAACACCGCT 

151 CGCCAAAAAG CCGAAGCCTA TGCGATTCCC TATTTCGATT TCCAACGTAT 

201 GACCGCATTG GCGGTCGGCA ACCCTTGGCG CACCGCGTCC GACGCGCAAA 

251 AACAAGCGTT GGCCAAAGAA TTTCAAACCC TGCTGATCCG CACCTATTCC 

301 GGCACGATGC TGAAATTAAA AAACGCCAAC GTCAACGTCA AAGACAATCC 

351 CATCGTCAAT AAAGGCGGCA AAGAAATCAT CGTCCGCGCC GAAGTCGGCG 

401 TACCCGGGCA AAAACCCGTC AACATGGACT TCACCACCTA CCAAAGCGGC 

451 GGTAAATACC GTACCTACAA CGTCGCCATC GAAGGCGCGA GCCTGGTTAC 

501 CGTGTACCGC AACCAATTCG GCGAAATTAT CAAAGCGAAA GGCGTGGACG 

551 GACTGATTGC CGAGTTGAAA GCCAAAAACG GCGGCAAATA A 

This corresponds to the amino acid sequence <SEQ ID 348; ORF91-l>: 

1 MKKSSLISAL GIGILSIGMA FA APADAVSQ IRQNATQVLS ILKNGDANTA 

51 RQKAEAYAIP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

101 GTMLKLKNAN VNVKDNPIVN KGGKEIIVRA EVGVPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGASLVTVYR NQFGEIIKAK GVDGLIAELK AKNGGK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF91 shows 92.4% identity over a 92aa overlap with an ORF (ORF91a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 91. pep MKKS SL I SALGIGILS I GMAFAAPADAVSQIRQNATQVLS I LKNGDANTARQKAEAYAI P 
I I I I I : I I I I I 1 I I I I I I I I I I I I I I I I : I I I I I 1 I I I i I I I I : I i I t I t I I I I I I I I I I 
orf 91a MKKS SFI SALGIGILS I GMAFAAPADAVNQIRQNATQVLS I LKSGDANTARQKAEAYAIP 

10 20 30 40 50 60 

70 80 90 

orf 91 . pep YFDFQRMTALAVGNPWXTXS DXQKQALAXE FQP 

I I I I I I I I I I I I I I I I I II I I I I I I III 
orf 91a YFDFQRMTALAVGNPWRT AS DAQKQALAKE FQTLLIRTYSGTMLKLKN AN VNVKDNPIVN 

70 80 90 100 110 120 
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orf91a KGGKEIIVRAEVGVPGQKPWMDFTTYQSGGKYRTYNVAIEGASLVTVYRNQFGEIIKAK 

130 140 150 160 170 180 

The complete length ORF91a nucleotide sequence <SEQ ID 349> is: 

5 1 ATGAAAAAAT CCTCCTTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCGCCC CTGCCGACGC GGTAAACCAA ATCCGTCAAA 

101 ACGCCACTCA AGTATTGAGC ATCTTAAAAA GCGGTGATGC CAACACCGCC 

151 CGCCAAAAAG CCGAAGCCTA TGCGATTCCC TATTTCGATT TCCAACGTAT 

201 GACCGCATTG GCGGTCGGCA ACCCTTGGCG CACCGCGTCC GACGCGCAAA 

10 251 AACAAGCGTT GGCCAAAGAA TTTCAAACCC TGCTGATCCG CACCTATTCC 

301 GGCACGATGC TGAAATTAAA AAACGCCAAC GTCAACGTCA AAGACAATCC 

351 CATCGTCAAT AAAGGCGGCA AAGAAATCAT CGTCCGCGCC GAAGTCGGCG 

401 TACCCGGGCA AAAACCCGTC AACATGGACT TCACCACCTA CCAAAGCGGC 

451 GGTAAATACC GTACCTACAA CGTCGCCATC GAAGGCGCGA GCCTGGTTAC 

15 501 CGTGTACCGC AACCAATTCG GCGAAATTAT CAAAGCGAAA GGCGTGGACG 

551 GACTGATTGC CGAGTTGAAG GCTAAAAACG GCAGCAAGTA A 

This encodes a protein having amino acid sequence <SEQ ID 350>: 

1 MKKSSFISAL GIGILSIGMA FA APAPAVNQ IRQNATQVLS ILKSGDANTA 

51 RQKAEAYAIP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

20 101 GTMLKLKNAN VNVKDNPIVN KGGKEIIVRA EVGVPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGASLVTVYR NQFGEIIKAK GVDGLIAELK AKNGSK* 

ORF91a and ORF91-1 show 98.0% identity in 196 aa overlap: 

10 20 30 40 50 60 

orf 91a. pep MKKSSFISALGIGILSIGMAFAAPADAVNQIRQNATQVLSILKSGDANTARQKAEAYAIP 
25 * * I I I I I : I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I 

orf 91-1 MKKS SLI S ALG I G I LS I GMAFAAPADAVSQ IRQNATQVLS I LKNG DANTARQKAE AYAI P 

10 20 30 40 50 60 

70 80 90 100 110 120 

30 orf 91a . pep Y FD FQRMTALAVGN PWRTAS DAQKQALAKE FQTLLIRT YSGTMLKLKNANVNVKDNPI VN 

I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I 1 I I I I I I I I I I I I I I II I I I I I I I I I I I 
orf 91-1 YFDFQRMTALAVGNPWRTASDAQKQALAKEFQTLLIRTYSGTMLKLKN AN VNVKDNPIVN 

70 80 90 100 110 120 

35 130 140 150 160 170 180 

orf 91a. pep KGGKEIIVRAEVGVPGQKPVNMDFTTYQSGGKYRTYNVAIEGASLVTVYRNQFGEIIKAK 
I | | | | I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
o r f 9 1 - 1 KGGKE 1 1 VRAEVG VPGQKPVNMDFTT YQSGGKYRT YNVAIEGAS LVTVYRNQFGE IIKAK 

130 140 150 160 170 180 

40 

190 

orf 91a . pep GVDGLIAELKAKNGSKX 
I I II I I I i I I M I I : I 1 
orf 91-1 GVDGLIAELKAKNGGKX 
45 190 

Homology with a predicted ORF from N .gonorrhoeae 

ORF91 shows 84.8% identity over a 92aa overlap with a predicted ORF (ORF91.ng) from N. 
gonorrhoeae: 

50 orf 91 .pep MKKS SLI SALG I GILS IGMAFAAPADAVSQIRQNATQVLS I LKNGDANTARQKAEAYAI P 60 

: I I I I : I I I I I I I I I I 1 I I I I I : I I I I I : I I it I I I I I I : I I 1 : I I I : I I I I I I I I : I 
orf91ng VKKSSFISALGIGILSIGMAFASPADAVGQIRQNATQVLTILKSGDAASARPKAEAYAVP 60 

orf 91 . pep Y FD FQRMTALAVGN PWXTX S DXQKQALAXE FQ P 93 
55 I I | | I I I | I I I I I I I I I I I I I I I I I I I I 

orf 91ng YFDFQRMTALAVGNPWRTASDAQKQALAKEFQTLLIRTYSGTMLKFKNATVNVKDNPIVN 120 

The complete length ORF91ng nucleotide sequence <SEQ ID 35 1> is predicted to encode a protein 
having amino acid sequence <SEQ ID 352>: 
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1 VKKSSFISAL GIGILSI6MA FA SPADAVGQ IRQNATQVLT ILKSGDAASA 

51 RPKAEAYAVP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

101 GTMLKFKNAT VNVKDNPIVN KGGKEIWRA EVGIPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGTSLVTVYR NQFGEIIKAK GIDGLIAELK AKNGGK* 

Further work revealed the complete nucleotide sequence <SEQ ID 353>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 



ATGAAAAAAT 
CGGCATGGCA 
ACGCCACACA 
CGCCCAAAAG 
GACCGCATTG 
AACAAGCGTT 
GGCACGATGC 
CATCGTCAAT 
TCCCCGGTCA 
GGCAAATACC 
CGTGTACCGC 
GGCTGATTGC 



CCTCCTTCAT 
TTTGCCTCCC 
GGTTTTGACC 
CCGAAGCCTA 
GCGGTCGGCA 
GGCCAAAGAA 
TGAAATTCAA 
AAGGGCGGCA 
GAAGCCCGTC 
GTACCTACAA 
AACCAATTCG 
CGAGTTGAAA 



CAGCGCATTG 
CGGCCGACGC 
ATCCTCAAAA 
TGCGGTTCCC 
ACCCTTGGCG 
TTTCAAACCC 
AAACGCGACC 
AGGAAATCGT 
AATATGGACT 
CGTCGCCATC 
GCGAAATCAT 
GCCAAAAACG 



GGCATCGGTA 
AGTGGGACAA 
GCGGCGACGC 
TATTTCGATT 
TACCGCGTCC 
TGCTGATCCG 
GTCAACGTCA 
CGTCCGTGCC 
TTACCACCTA 
GAAGGCACGA 
CAAAGCCAAA 
GCGGCAAATA 



TTTTGAGCAT 
ATCCGCCAAA 
GGCTTCTGCA 
TCCAACGTAT 
GACGCGCAAA 
CACCTATTCC 
AAGACAATCC 
GAAGTCGGCA 
CCAAAGCGGC 
GCCTGGTTAC 
GGCATCGACG 
A 



This corresponds to the amino acid sequence <SEQ ID 354; ORF91ng-l>: 



1 MKKSSFISAL GIGILSIGMA FA SPADAVGQ IRQNATQVLT ILKSGDAASA 

51 RPKAEAYAVP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

101 GTMLKFKNAT VNVKDNPIVN KGGKEIWRA EVGIPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGTSLVTVYR NQFGEIIKAK GIDGLIAELK AKNGGK* 

ORF91ng-l and ORF91-1 show 92.3% identity in 196 aa overlap: 



10 20 30 40 50 60 

orf 91-1 . pep MKKSSLISALGIGILSIGMAFAAPADAVSQIRQNATQVLSILKNGDANTARQKAEAYAIP 
I I II I : II I I 1 I I I I I I I M I I: I I I I I : I I I I I I I i M : I M : I I I :|| llllll:| 
orf91ng-l MKKSSFISALGIGILSIGMAFASPADAVGQIRQNATQVLTILKSGDAASARPKAEAYAVP 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 91-1 . pep YFD FQRMTALAVGN PWRT AS DAQKQALAKE FQTLL I RTYSGTMLKLKNANVN VKDN PI VN 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I i I I I : I I 1 : I I I I I I I I I I 
orf91ng-l YFD FQRMTALAVGN PWRT AS DAQKQALAKE FQTLL I RTYS GTMLKFKNAT VNVKDNPIVN 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 91-1 . pep KGGKEIIVRAEVGVPGQKPVNMDFTTYQSGGKYRTYNVAIEGASLVTVYRNQFGEIIKAK 
I I I I I I : I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I 
orf 91ng-l KGGKEIWRAEVGIPGQKPVNMDFTTYQSGGKYRTYNVAIEGTSLVTVYRNQFGEIIKAK 
130 140 150 160 170 180 



190 

orf 91-1. pep GVDGLIAELKAKNGGKX 
I: I M I 1 I I I I! It II I 
orf91ng-l GIDGLIAELKAKNGGKX 
190 

In addition, ORF91ng-l shows homology to a hypothetical Exoli protein: 



sp|P45390| YRBCJ5COLI HYPOTHETICAL 24.0 KD PROTEIN IN MURA-RPON INTERGENIC 
REGION PRECURSOR (F211) >gi 1606130 (U18997) ORF_f211 [Escherichia coli] 
>gi 1 1789583 (AE000399) hypothetical 24.0 kD protein in murZ-rpoN intergenic 
region [Escherichia coli] Length = 211 

Score =70.6 bits (170), Expect - 6e-12 

Identities - 42/137 (30%), Positives - 76/137 (54%), Gaps - 6/137 (4%) 

Query: 59 VPYFDFQRMTALAVGNPWRTAS DAQKQALAKE FQTLL IRTYSGTMLKFKNATVNVKDNP I 118 

+PY + AL +G +++A+ AQ++A F+L + Y + + T + P 
Sbjct: 65 LP YVQVKYAGALVLGQYYKSAT PAQREAYFAAFRE YLKQAYGQALAMYHGQT YQIA — PE 122 

Query: 119 VNKGGKEIV-VRAEVGIP-GQKPVNMDFTTYQSG — GKYRTYNVAI EGTSLVTVYRNQFG 174 

G K IV +R + P G+ PV +DF ++ G ++ Y++ EG S++T +N++G 
Sbjct: 123 QPLGDKTIVPIRVTIIDPNGRPPVRLDFQWRKNSQTGNWQAYDMIAEGVSMITTKQNEWG 182 
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Query: 175 EIIKAKGIDGLIAELKA 191 

+++ KGIDGL A+LK+ 
Sbjct: 183 T LLRT KG I DG LT AQLKS 199 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from Kmeningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 42 

The following DNA sequence was identified in Kmeningitidis <SEQ ED 355>: 

1 ATGAAACACA TACTCCCCCT GATTGCCGCA TCCGCACTCT GCATTTCAAC 

51 CGCTTCGGCA CATCCTGCCA GCGAACCGTC CACTCAAAAC GAAACCGCTA 

101 TGATCACGCA TACCCTCATC TCAAAATACA GTTTTGGnnn nnnnnnnnnn 

151 nnnnnnnnnn nnGCCATAAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

201 CGACCATCAG GAAGCCGCAC GCCGAAACGG CTTAACGATG CAGCCGGCAA 

251 AAGTCATCGT CTTCGGCACG CCCAAAGCCG GCACGCCGCT GATGGTCAAA 

301 GACCCCGCCT TCGCCCTGCA ACTGCCCCTA CGCGTCCTCG TTACCGAAAC 

351 GGACGGCAAA GTACGCGCCG CCTATACCGA TACGCGCGCC CTCATCGCCG 

401 GCAGCCGCAT CGGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 

451 AAACTGATAC AAAAAACCGT AGGCGAATAA 

This corresponds to the amino acid sequence <SEQ ID 356; ORF97>: 

1 MKHILPLIAA SALCISTASA HPASEPSTQN ETAMITHTLI SKYSFGXXXX 
51 XXXXAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 
101 DPAFALQLPL RVLVTETDGK VRAAYTDTRA LIAGSRIGFD EVANTLANAE 
151 KLIQKTVGE* 

Further work revealed the complete nucleotide sequence <SEQ ED 357>: 

1 ATGAAACACA TACTCCCCCT GATTGCCGCA TCCGCACTCT GCATTTCAAC 

51 CGCTTCGGCA CATCCTGCCA GCGAACCGTC CACCCAAAAC GAAACCGCTA 

101 TGACCACGCA TACCCTCACC TCAAAATACA GTTTTGACGA AACCGTCAGC 

151 CGCCTTGAAA CCGCCATAAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

201 CGACCATCAG GAAGCCGCCC GCCGAAACGG CTTAACGATG CAGCCGGCAA 

251 AAGTCATCGT CTTCGGCACG CCCAAAGCCG GCACGCCGCT GATGGTCAAA 

301 GACCCCGCCT TCGCCCTGCA ACTGCCCCTA CGCGTCCTCG TTACCGAAAC 

351 GGACGGCAAA GTACGCGCCG CCTATACCGA TACGCGCGCC CTCATCGCCG 

401 GCAGCCGCAT CGGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 

451 AAACTGATAC AAAAAACCGT AGGCGAATAA 

This corresponds to the amino acid sequence <SEQ ED 358; ORF97-l>: 

1 MKHILPLIAA SALCISTASA HPASEPSTQN ETAMTTHTLT SKYSFDETVS 
51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 
101 DPAFALQLPL RVLVTETDGK VRAAYTDTRA LIAGSRIGFD EVANTLANAE 
151 KLIQKTVGE* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from Kmeningitidis (strain A) 

ORF97 shows 88.7% identity over a 159aa overlap with an ORF (ORF97a) from strain A of K 
meningitidis: 

10 20 30 40 50 60 

orf 97 . pep MKHILPLIAASALCISTASAHPASEPSTQNETAMITHTLISKYSFGXXXXXXXXAIKSKG 
I Mill llllllilll 111111:1111111 MM Mill : MMIII 
orf 97a MXHILPLXXASALCISTASXHPASEPQTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 

10 20 30 40 50 60 
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70 80 90 100 110 120 

orf 97 . pep MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 97a MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVXVTETDGK 

70 80 90 100 110 120 



130 140 150 160 

orf 97 . pep VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTVGEX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I 
orf 97a VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTIGEX 

130 140 150 160 

The complete length ORF97a nucleotide sequence <SEQ ED 359> is: 



1 ATGANACACA TACTCCCCCT 

51 CGCTTCGGNN CATCCTGCCA 

101 TGACCACGCA TACCCTCACC 

151 CGCCTTGAAA CCGCCATAAA 

201 CGACCATCAG GAAGCCGCCC 

251 AAGTCATCGT CTTCGGCACG 

301 GACCCCGCCT TCGCCCTGCA 

351 GGACGGCAAA GTACGCGCCG 

401 GCAGCCGCAT CGGTTTCGAC 

4 51 AAACTGATAC AAAAAACCAT 

This encodes a protein having amino acic 



GANTGNCGCA TCCGCACTCT GCATTTCAAC 
GCGAACCGCA AACCCAAAAC GAAACCGCTA 
TCAAAATACA GTTTTGACGA AACCGTCAGC 
AAGCAAAGGG ATGGACATTT TTGCCGTCAT 
GCCGAAACGG CTTAACGATG CAGCCGGCAA 
CCCAAAGCCG GTACGCCGCT GATGGTCAAA 
ACTGCCCCTG CGCGTCNTCG TTACCGAAAC 
CCTATACCGA TACGCGCGCC CTCATCGCCG 
GAAGTGGCAA ACACTTTGGC AAACGCCGAA 
AGGCGAATAA 

sequence <SEQ ID 360>: 



1 MXHILPLXXA SALCISTASX HPASEPQTQN ETAMTTHTLT SKYSFDETVS 

51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 

101 DPAFALQLPL RVXVTETDGK VRAAYTDTRA LIAGSRIGFD EVANTLANAE 

151 KLIQKTIGE* 

ORF97a and ORF97-1 show 95.6% identity in 159 aa overlap: 



10 20 30 40 50 60 

or f 97a . pep MXHI LPLXXAS ALC I STASXHPASE PQTQNETAMTTHT LT SKYS FDET VSRLET AI KSKG 
I Mill MINIM II I i I I I h I I I I II II I I I I M I I I I I I I I I I I I I ! I I I I I 
orf 97-1 MKHILPLIAASALCISTASAHPASEPSTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 97a . pep MDI FAVI DHQEAARRNGLTMQPAKVI VFGTPKAGT PLMVKDPAFALQLPLRVXVTETDGK 
I I I II I I I I I II I I 1 M I I II M I I I I I I I I II I I II I I I I I II II I I I I I I II I II M 
orf 97-1 MDI FAVI DHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 

70 80 90 100 110 120 



130 140 150 160 

orf 97a. pep VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTIGEX 

I 1 1 I I I I I I I I t I I I I I I 1 1 I I I I I I I I I I I I I I 1 I : I I 1 
orf 97-1 VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTVGEX 
130 140 150 160 



Homology with a predicted ORF from N. gonorrhoeae 

ORF97 shows 88.1% identity over a 159aa overlap with a predicted ORF (ORF97.ng) from K 
gonorrhoeae: 



orf 97. pep MKHILPLIAASALCISTASAHPASEPSTQNETAMITHTLISKYSFGXXXXXXXXAIKSKG 60 

MIMI M I I I : M I I I I I I M : : I M I II M Kill Mill : MIIMI 
orf97ng MKHILPPIAASAFCISTASAHPAGKPPTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 60 

orf 97 .pep MDI FAVI DHQEAARRNGLTMQPAKVI VFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 120 

I I II I M II I I II M I I I I II II I I II II II I II II I I I II M I I I II I I I II I I I I I I I 
orf97ng MDI FAVI DHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 120 

orf 97 .pep VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTVGE 159 

I I: M I I I 11 M:M I MM M M I I I I I I Ml II M I I 
orf97ng VRT A YT DTRAL I VGS R I S FDE VANT LAN AE KL I QKT VGE 159 
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The complete length ORF97ng nucleotide sequence <SEQ ID 361> is predicted to encode a protein 
having amino acid sequence <SEQ ID 362>: 

1 MKHILPPIAA SAFCISTASA HPAGKPPTQN ETAMTTHTLT SKYSFDETVS 
51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 
5 101 DPAFALQLPL RVLVTETDGK VRTAYTDTRA LIVGSRISFD EVANTLANAE 

151 KLIQKTVGE* 

Further work revealed the complete nucleotide sequence <SEQ ID 363>: 

1 ATGAAACACA TACTCCCcct gatcgccgca TccgcactCT GCATTTCAAC 

51 CGCTTCGGCA CACCCTGCCG GCAAACCGCC CACCCAAAAC GAAACCGCTA 

10 101 TGACCACGCA CACCCTCACC TCGAAATACA GTTTTGACGA AACCGTCAGC 

151 CGCCTTGAAA CCGCCATAAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

201 CGACCATCAG GAAGCGGCAC GCCGAAACGG CCTGACCATG CAGCCGGCAA 

251 AAGTCATCGT CTTCGGCACG CCCAAGGCCG GTACGCCgct GATGGTCAAA 

301 GACCCCGCCT TCGCCCTGCA ACTGCCCCTG CGCGTCCTCG TTACCGAAAC 

15 351 GGACGGCAAA GTACGCACCG CCTATACCGA TACGCGCGCC CTCATCGTCG 

401 GCAGCCGCAT CAGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 

451 AAACTGATAC AAAAAACCGT AGGCGAATAA 

This corresponds to the amino acid sequence <SEQ ID 364; ORF97ng-l>: 

1 MKHILPLIAA SALCISTASA HPAGKPPTQN ETAMTTHTLT SKYSFDETVS 

20 51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 

101 DPAFALQLPL RVLVTETDGK VRTAYTDTRA LIVGSRISFD EVANTLANAE 

151 KLIQKTVGE* 

ORF97ng-l and ORF97-1 show 96.2% identity in 159 aa overlap: 

10 20 30 40 50 60 

25 orf 97-1 . pep MKHILPLIAASALCISTASAHPASEPSTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 

I I M I I ! I I I I I I I I I I I I I I I i : : I I I I I I I I I I I I I I I 11 I I I I I I I I I I I I I I I I I 
orf97ng-l MKHILPLIAASALCISTASAHPAGKPPTQNETAMTTHTLTSKYS FDETVSRLETAIKSKG 

10 20 30 40 50 60 

30 70 80 90 100 110 120 

orf 97-1. pep MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 
I II I M I MM II II I IMIMI I I I I Mill M I I I I II Ml I I II II I Mill Mill 
orf97ng-l MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 

70 80 90 100 110 120 

35 

130 140 150 160 

orf 97-1 . pep VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTVGEX 
I I: I I I I I I I MM I I Ml I I II I I I I I I I I I I I I I I I I I 
orf97ng-l VRTAYTDTRALIVGSRISFDEVANTLANAEKLIQKTVGEX 
40 130 140 150 160 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it was predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF97-1 (15.3kDa) was cloned in pET and pGex vectors and expressed in Rcoli, as described 
45 above. The products of protein expression and purification were analyzed by SDS-PAGE. Figures 
12A & 12B show, repsectively, the results of affinity purification of the GST-fusion and His-fusion 
proteins. Purified GST-fusion protein was used to immunise mice, whose sera were used for 
Western Blot (Figure 12C), ELISA (positive result), and FACS analysis (Figure 12D). These 
experiments confirm that ORF97-1 is a surface-exposed protein, and that it is a useful immunogen. 
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Figure 12E shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF97-1. 
Example 43 

The following DNA, believed to be complete, sequence was identified in N.meningitidis <SEQ ID 
365>: 

1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC AGTAAATGGC TGATTGTGCC 
51 GCTGATGCTC CCCGCCTTTC AGAATGTGGC GGCGGAGGGG ATAGATGTGA 
101 GCCGTGCCGA AGCGAGGATA ACCGACGGCG GGCAGCTTTC CATCAGCAGC 
151 CGCTTCCAAA CCGAGCTGCC CGACCAGCTC CAACAGGCGT TGCGCCGGGg 
201 CGTGCCGCTC AACTTTACCT TAAGCTGGCA GCTTTCCGCC CCGATAATCG 
251 CTTCTTATCG GTTTAAATTG GGGCAACTGA TTGGCGATGA CGACaATATT 
301 GACTACAAAC TGAGTTTCCA TCCGCTGACc AaACGCTACC GCGTTACCgT 
351 CGgCGCGTTT TCGACAGACT ACGACACCTT GGATGCGGCA TTGCGCGCGA 
401 CCGGCGCGGT TGCCAACTGG AAAGTCCTGA ACAAAGGCGC GCTGTCCGGT 
451 GCGGAAGCAG GGGAAACCAA GGCGGAAATC CGCCTGACGC TGTCCACTTC 
501 AAAACTGCCC AAGCCTTTTC AAATCAATGC ATTGACTTCT CAAAACTGGC 
551 ATTTGGATTC GGGTTGGAAA CCTCTAAACA TCATCGGGAA CAAATAA 

This corresponds to the amino acid sequence <SEQ ID 366; ORF106>: 

1 MAFITRLFKS SKWLIVPLML PAFQNVAAEG IDVSRAEARI TDGGQLSISS 

51 RFQTELPDQL QQALRRGVPL NFTLSWQLSA PIIASYRFKL GQLIGDDDNI 

101 DYKLSFHPLT KRYRVTVGAF STDYDTLDAA LRATGAVANW KVLNKGALSG 

151 AEAGETKAEI RLTLSTSKLP KPFQINALTS QNWHLDSGWK PLNIIGNK* 

Further work revealed the following DNA sequence <SEQ ID 367>: 

1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC AGTAAATGGC TGATTGTGCC 

51 GCTGATGCTC CCCGCCTTTC AGAATGTGGC GGCGGAGGGG ATAGATGTGA 

101 GCCGTGCCGA AGCGAGGATA ACCGACGGCG GGCAGCTTTC CATCAGCAGC 

151 CGCTTCCAAA CCGAGCTGCC CGACCAGCTC CAACAGGCGT TGCGCCGGGG 

201 CGTGCCGCTC AACTTTACCT TAAGCTGGCA GCTTTCCGCC CCGATAATCG 

251 CTTCTTATCG GTTTAAATTG GGGCAACTGA TTGGCGATGA CGACAATATT 

301 GACTACAAAC TGAGTTTCCA TCCGCTGACC AACCGCTACC GCGTTACCGT 

351 CGGCGCGTTT TCGACAGACT ACGACACCTT GGATGCGGCA TTGCGCGCGA 

401 CCGGCGCGGT TGCCAACTGG AAAGTCCTGA ACAAAGGCGC GCTGTCCGGT 

451 GCGGAAGCAG GGGAAACCAA GGCGGAAATC CGCCTGACGC TGTCCACTTC 

501 AAAACTGCCC AAGCCTTTTC AAATCAATGC ATTGACTTCT CAAAACTGGC 

551 ATTTGGATTC GGGTTGGAAA CCTCTAAACA TCATCGGGAA CAAATAA 

This corresponds to the amino acid sequence <SEQ ID 368; ORF106-1>: 

1 MAFITRLFKS SKWLIVPLML PAFQNVAAEG IDVSRAEA RI TDGGQLSISS 

51 RFQTELPDQL QQALRRGVPL NFTLSWQLSA PIIASYRFKL GQLIGDDDNI 

101 DYKLSFHPLT NRYRVTVGAF STDYDTLDAA LRATGAVANW KVLNKGALSG 

151 AEAGETKAEI RLTLSTSKLP KPFQINALTS QNWHLDSGWK PLNIIGNK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF106 shows 87.4% identity over a 199aa overlap with an ORF (ORF106a) from strain A of N. 
meningitidis: 

10 20 30 40 50 59 

or f 106. pep MAFITRLFKSSK-WLIVPLMLPAFQNVAAEGIDVSRAEARITDGGQLSISSRFQTELPDQ 
| I I I I I I I I I i II:: II : : : : I I I I I II I I I I I I I : I I II I I I I I I I I I I I I 
orfl06a MAFITRLFKS I KQWLVLLPMLSVLPDAAAEGIDVSRAEARIXDGGQLSXXSRFQTELPDQ 

10 20 30 40 50 60 



60 . 70 80 90 100 110 119 

orf 106 . pep LQQALRRGVPLNFTLSWQLSAPIIASYRFKLGQLIGDDDNIDYKLSFHPLTKRYRVTVGA 
II | Ml II II I I I I I I I I I I I I I MINIMI 1 II II I I I I I | : | | || | || | 
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orfl06a LQXAXXRGVXLNXTLXWQLSAPIIASYRFXLGQLIGDDDXIDYKLSFHPLTNRYRVTVGA 

70 80 90 100 110 120 

120 130 140 150 160 170 179 

5 orf 106 . pep FSTDYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAEIRLTLSTSKLPKPFQINALT 

Mi I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
orf 106a FSTXYDTLDAALRATGAVANWKVLNKGALS GAEAGETKAE I RLTLSTSKLPKPFQINALT 

130 , 140 150 160 170 180 



10 180 190 199 

orf 106. pep SQNWHLDSGWKPLN I IGNKX 
I I I I I I I I I I I I I I I I I I I I 
orf 106a SQNWHLDSGWKPLN I IGNKX 

190 200 



1 5 Due to the K->N substitution at residue 1 1 1 , the homology between ORF 1 06a and ORF 1 06- 1 is 
87.9% over the same 199 aa overlap. 



The complete length ORF 106a nucleotide sequence <SEQ ID 369> is: 



1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC ATTAAACAAT GGCTTGTGCT 

51 GCTGCCGATG CTTTCCGTTT TGCCGGACGC GGCGGCGGAG GGGATAGATG 

20 101 TGAGCCGCGC CGAAGCGAGG ATAANCGACG GCGGGCAGCT TTGCATNAGN 

151 AGCCGCTTCC AAACCGAGCT GCCCGACCAG CTCCAANNNG CGNNGNGCCG 

201 GGGCGTGNCG CTCAACTNTA CCTTAAGNTG GCAGCTTTCC GCCCCGATAA 

251 TCGCTTCTTA TCGGTTTNAA TTGGGGCAAC TGATTGGCGA TGACGACNAT 

301 ATTGACTACA AACTGAGTTT CCATCCGCTG ACCAACCGCT ACCGCGTTAC 

25 351 CGTCGGCGCG TTTTCGACAG ANTACGACAC CTTGGATGCG GCATTGCGCG 

401 CGACCGGCGC GGTTGCCAAC TGGAAAGTCC TGAACAAAGG CGCGCTGTCC 

451 GGTGCGGAAG CAGGGGAAAC CAAGGCGGAA ATCCGCCTGA CGCTGTCCAC 

501 TTCAAAACTG CCCAAGCCTT TTCAAATCAA TGCATTGACT TCTCAAAACT 

551 GGCATTTGGA TTCGGGTTGG AAACCTCTAA ACATCATCGG GAACAAATAA 

30 This encodes a protein having amino acid sequence <SEQ ID 370>: 



1 MAFITRLFKS IKQWLVLLPM LSVLPDAAAE GIDVSRAEA R IXDGGQLSXX 

51 SRFQTELPDQ LQXAXXRGVX LNXTLXWQLS APIIASYRFX LGQLIGDDDX 

101 IDYKLSFHPL TNRYRVTVGA FSTXYDTLDA ALRATGAVAN WKVLNKGALS 

151 GAEAGETKAE IRLTLSTSKL PKPFQINALT SQNWHLDSGW KPLNIIGNK* 

Homology with a predicted ORF from Ksonorrhoeae 

ORF106 shows 90.5% identity over a 199aa overlap with a predicted ORF (ORF106.ng) from N. 
gonorrhoeae: 



40 



45 



50 



orfl06.pep 
orf 106ng 
orf 106. pep 
orf 106ng 
orf 106. pep 
orf 106ng 
orfl06.pep 
orf 106ng 



MAFITRLFKS SK-WLIVPLMLPAFQNVAAEGIDVSRAEARITDGGQLS I SSRFQTELPDQ 
I I I I I I I I I I I I I :: : I :: : : I I I I I : : I I II I I I I 11 : I I I I I I I I I M I I I 
MAFITRLFKS IKQWLVLLP I LSVLPDAAAEGIAATRAEARITDGGRLS I SSRFQTELPDQ 



59 



60 



119 



LQQALRRGVPLNFTLSWQLSAPIIASYRFKLGQLIGDDDNIDYKLSFHPLTKRYRVTVGA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I t I I I I I I I I I I I I I I I I I I I I : II I I I I I I 
LQQALRRGVPLNFTLSWQLSAPTIASYRFKLGQLIGDDDNIDYKLSFHPLTNRYRVTVGA 120 

FSTDYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAEIRLTLSTSKLPKPFQINALT 179 

I M I 1 I 1 I 11 I I I II I I I I I! I I I I I I I I I I I M I t I I I I I M I I I I I I I I I I I M I I I! 

FST DYDTLDAALRATGAVANWKVLNKGALS GAEAGETKAE I RLTLSTSKLPKPFQINALT 180 

SQNWHLDSGWKPLNI IGNK 198 
Mill I I I lllllll II I I 
SQNWHLDSGWKPLNI IGNK 199 



Due to the K->N substitution at residue 1 1 1, the homology between ORF106ng and ORF 106-1 is 



55 91 .0% over the same 199 aa overlap. 
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The complete length ORF106ng nucleotide sequence <SEQ ID 371> is: 

1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC ATTAAACAAT GGCTTGTGCT 

51 GTTGCCGATA CTCTCCGTTT TGCCGGACGC GGCGGCGGAG GGCATTGCCG 

101 CGACCCGCGC CGAAGCGAGG ATAACCGACG GCGGGCGGCT TTCCATCAGC 

151 AGCCGCTTCC AAACCGAGCT GCCCGACCAG CTCCAACAGG CGTTGCGCCG 

201 GGGCGTACCG CTCAACTTTA CCTTAAGCTG GCAGCTTTCC GCCCCGACAA 

251 TCGCTTCTTA TCGGTTTAAA TTGGGGCAAC TGATTGGCGA TGACGACAAT 

301 ATTGACTACA AACTAAGTTT CCATCCGCTG ACCAACCGCT ACCGCGTTAC 

351 CGTCGGCGCA TTTTCCACCG ATTACGACAC TTTGGATGCG GCATTGCGCG 

401 CGACCGGCGC GGTTGCCAAC TGGAAAGTCC TGAACAAAGG CGCGTTGTCC 

451 GGTGCGGAAG CAGGGGAAAC CAAGGCGGAA ATCCGCCTGA CGCTGTCCAC 

501 TTCAAAACTG CCCAAGCCTT TCCAAATCAA CGCATTGACT TCTCAAAACT 

551 GGCATTTGGA TTCGGGTTGG AAACCTCTAA ACATCATCGG GAACAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 372>: 

1 MAFITRLFKS IKQWLVLLPI LSVLPDAAAE GIAATRAEA R ITDGGRLSIS 

51 SRFQTELPDQ LQQALRRGVP LNFTLSWQLS APTIASYRFK LGQLIGDDDN 

101 IDYKLSFHPL TNRYRVTVGA FSTDYDTLDA ALRATGAVAN WKVLNKGALS 

151 GAEAGETKAE IRLTLSTSKL PKPFQINALT SQNWHLDSGW KPLNIIGNK* 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it was predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF106-1 (18kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
13A shows the results of affinity purification of the His-fusion protein, and Figure 13B shows the 
results of expression of the GST-fusion in E.colL Purified His-fusion protein was used to immunise 
mice, whose sera were used for FACS analysis (Figure 13C) These experiments confirm that 
ORF 106-1 is a surface-exposed protein, and that it is a useful immunogen. 

Example 44 

The following DNA sequence, believed to be complete, was identified in N.meningitidis <SEQ ID 



373>: 



1 ATGGACACAA AAGAAATCCT CGG . TACGCG GcAGGcTCGA TCGGCAGCGC 

51 GGTTTTAGCC GTCATCATCc TGCCGCTGCT GTCGTGGTAT TTCCCCGCCG 

101 ACGACATCGG GCGCATCGTG CTGATGCAGA CGGCGGCGGG GCTgACGGTG 

151 TCGGTGTTGT GCCTCGGGCT GGATCAGGCA TACGTCCGCG AATACTATGC 

201 CACCGCCGAC AAAGACAcCT TGTTCAAAAC CCTGTTCCTG CCGCCGCTGC 

251 TGTCTGCCGC CGCGATAGCC GCCCTGCTGC TTTCCCGCCC GTCCCTGCCG 

301 TCTGAAATCC TGTTTTCACT CGACGATGCC gCCGCCGGCa TCGGGCTGGT 

351 GCTGTTTGAA CtGAGCTTCC TGCCCATCCG cTTTCTCTTA CTGGTTTTGC 

401 GTATGGAAGG ACGCGCCcTT GCCTTTTCGT CCGCGCAACT CGTGCcCAAG 

451 CTCGCCATCC TGCTGCTG.T GCCGCTGACG GTCGGGCTGC TGCACTTTCC 

501 AGCGAACACC GCCGTCCTGA CCGCCGTTTA CGCGCTGGCA AACCTTGCCG 

551 CCGCCGCCTT TTTGCTGTTT CAAAACCGAT GCCGTCTGAA GGCCGTCCGG 

601 CACGCACCGT TTTCGCCCGC CGTCCTGCAC CGGGGG.TGC GCTACGGCAT 

651 ACCGATCGCA CTGAGCAGCA TCGCCTATTG GGGGCTGGCA TCCGCCGACC 

701 GTTTGTTCCT GAAAAAATAT GCCGGCCTGG AACAGCTCGG CGTTTATTCG 

751 ATGGGTATTT CGTTCGGCGG GGCGGCATTA TTGTTCCAAA GCATCTTTTC 

801 AACGGTCTGG ACACCGTATA TTTTCCGCGC AATCGAAGAA AACGCCCCGC 

851 CCGCTCGCCT CTCGGCAACG GCAGAATCCG CCGCCGCCCT GCTTGCCTCC 

901 GCCCTCTGC. TGACCGGCAT TTTCTCGCCC CTTGCCTCCC TCCTGCTGCC 

951 GGAAAACTAC GCCGCCGTCC GGTTTATCGT CGTATCGTGT ATG.TGCCGC 
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1001 CGCTGTTTTG CACGCTGGCG GAAATCAGCG GCATCGGTTT GAACGTCGTT 

1051 CGCAAAACGC GCCCGATCGC GCTCGCCACC TTGGGCGCGC TGGCGGCAAA 

1101 CCTGCTGCTG CTGGGGCTTG ACCGTGCCGT ACCGGCGAGG CCGCC.GGCG 

1151 CGGCGGTTGC CTGTGCCGCC TCATTCTGGC TGTTTTTTGC CTTCAAGACC 

1201 GAAAGCTCyT GCCGCCTGTG GCAGCCGCTC AAACGCCTGC CGCTTTATCT 

1251 GCACACATTG TTCTGCCTGA CCTCCTCGGC GGCCTACACC TGCTTCGGCA 

1301 CGCCGGCAAA CTATCCCCTG TTTGCCGGCG TATGGGCGGC ATATCTGGCA 

1351 GGCTGCATCC TGCGCCACCG GAAAGATTTG CACAAACTGT TTCATTATTT 

1401 GAAAAAACAA GGTTTCCCAT TATGA 

This corresponds to the amino acid sequence <SEQ ID 374; ORFIO: 



1 MDTKEILXYA AGSIGSAVLA VIILPLLSWY FPADDIGRIV LMQTAAGLTV 

51 SVLCLGLDQA YVREYYATAD KDTLFKTLFL PPLLSAAAIA ALLLSRPSLP 

101 SEILFSLDDA AAGIGLVLFE LSFLPIRFLL LVLRMEGRAL AFSSAQLVPK 

151 LAILLLXPLT VGLLHFPANT AVLTAVYALA NLAAAAFLLF QNRCRLKAVR 

201 HAPFSPAVLH RGXRYGIPIA LSSIAYWGLA SADRLFLKKY AGLEQLGVYS 

251 MGISFGGAAL LFQSIFSTVW TPYIFRAIEE NAPPARLSAT AESAAALLAS 

301 ALCXTGIFSP LASLLLPENY AAVRFIWSC MXPPLFCTLA EISGIGLNW 

351 RKTRPIALAT LGALAANLLL LGLDRAVPAR PXGAAVACAA SFWLFFAFKT 

401 ESSCRLWQPL KRLPLYLHTL FCLTSSAAYT CFGTPANYPL FAGVWAAYLA 

451 GCILRHRKDL HKLFHYLKKQ GFPL* 

Further sequence analysis revealed the complete DNA sequence<SEQ ID 375> to be: 



1 ATGGACACAA AAGAAATCCT CGGCTACGCG GCAGGCTCGA TCGGCAGCGC 

51 GGTTTTAGCC GTCATCATCC TGCCGCTGCT GTCGTGGTAT TTCCCCGCCG 

101 ACGACATCGG GCGCATCGTG CTGATGCAGA CGGCGGCGGG GCTGACGGTG 

151 TCGGTGTTGT GCCTCGGGCT GGATCAGGCA TACGTCCGCG AATACTATGC 

201 CACCGCCGAC AAAGACACCT TGTTCAAAAC CCTGTTCCTG CCGCCGCTGC 

251 TGTCTGCCGC CGCGATAGCC GCCCTGCTGC TTTCCCGCCC GTCCCTGCCG 

301 TCTGAAATCC TGTTTTCACT CGACGATGCC GCCGCCGGCA TCGGGCTGGT 

351 GCTGTTTGAA CTGAGCTTCC TGCCCATCCG CTTTCTCTTA CTGGTTTTGC 

401 GTATGGAAGG ACGCGCCCTT GCCTTTTCGT CCGCGCAACT CGTGCCCAAG 

4 51 CTCGCCATCC TGCTGCTGCT GCCGCTGACG GTCGGGCTGC TGCACTTTCC 

501 AGCGAACACC GCCGTCCTGA CCGCCGTTTA CGCGCTGGCA AACCTTGCCG 

551 CCGCCGCCTT TTTGCTGTTT CAAAACCGAT GCCGTCTGAA GGCCGTCCGG 

601 CACGCACCGT TTTCGCCCGC CGTCCTGCAC CGGGGGCTGC GCTACGGCAT 

651 ACCGATCGCA CTGAGCAGCA TCGCCTATTG GGGGCTGGCA TCCGCCGACC 

701 GTTTGTTCCT GAAAAAATAT GCCGGCCTGG AACAGCTCGG CGTTTATTCG 

751 ATGGGTATTT CGTTCGGCGG GGCGGCATTA TTGTTCCAAA GCATCTTTTC 

801 AACGGTCTGG ACACCGTATA TTTTCCGCGC AATCGAAGAA AACGCCCCGC 

851 CCGCCCGCCT CTCGGCAACG GCAGAATCCG CCGCCGCCCT GCTTGCCTCC 

901 GCCCTCTGCC TGACCGGCAT TTTCTCGCCC CTTGCCTCCC TCCTGCTGCC 

951 GGAAAACTAC GCCGCCGTCC GGTTTATCGT CGTATCGTGT ATGCTGCCGC 

1001 CGCTGTTTTG CACGCTGGCG GAAATCAGCG GCATCGGTTT GAACGTCGTC 

1051 CGCAAAACGC GCCCGATCGC GCTCGCCACC TTGGGCGCGC TGGCGGCAAA 

1101 CCTGCTGCTG CTGGGGCTTG CCGTGCCGTC CGGCGGCGCG CGCGGCGCGG 

1151 CGGTTGCCTG TGCCGCCTCA TTCTGGCTGT TTTTTGCCTT CAAGACCGAA 

1201 AGCTCCTGCC GCCTGTGGCA GCCGCTCAAA CGCCTGCCGC TTTATCTGCA 

1251 CACATTGTTC TGCCTGACCT CCTCGGCGGC CTACACCTGC TTCGGCACGC 

1301 CGGCAAACTA TCCCCTGTTT GCCGGCGTAT GGGCGGCATA TCTGGCAGGC 

1351 TGCATCCTGC GCCACCGGAA AGATTTGCAC AAACTGTTTC ATTATTTGAA 

1401 AAAACAAGGT TTCCCATTAT GA 

This corresponds to the amino acid sequence <SEQ ID 376; ORF10-1>: 



1 MDTKEILGYA AGSIGSAVLA VIILPLLSWY FPA DDIGRI V LMQTAAGLTV 

51 SVLCL GLDQA YVREYYATAD KDTLFKT LFL PPLLSAAAIA ALLLSRPSLP 

101 SEILFSLDDA AAGIG LVLFE LSFLPIRFLL LV LRMEGRAL AFSSAQLVPK 

151 LAILLLLPLT VGLL HFPANT AVLTAVYALA NLAAAAFL LF QNRCRLKAVR 

201 HAPFSPAVLH RGLRYGIPIA LSSIAYWGLA SADRLFLKKY AGLEQ LGVYS 

251 MGISFGGAAL LF QSIFSTVW TPYIFRAIEE NAPPARLSAT AES AAALLAS 

301 ALCLTGIFSP L ASLLLPENY AAVRFIWSC MLPPLFCTLA EISGIGLNW 

351 RKTRP IALAT LGALAANLLL LG LAVPSGGA R GAAVACAAS FWLFFAFK TE 

401 SSCRLWQPLK RLPLYLHTLF CLTSSAAYTC FGTPANYPLF AGVWAAYLAG 

451 CILRHRKDLH KLFHYLKKQG FPL* 



Computer analysis of this amino acid sequence gave the following results: 
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10 



Prediction 

ORF10-1 is predicted to be the precursor of an integral membrane protein, since it comprises 
several (12-13) potential transmembrane segments, and a probable cleavable signal peptide 

Homology with EpsM from Streptococcus thermophilus (accession number U40830). 
ORF10 shows homology with the epsM gene of S. thermophilus, which encodes a protein of a size 
similar to ORF10 and is involved in expolysaccharide synthesis. Other homologies are with 
prokaryotic membrane proteins: 

Identities = (25%) 

Query: 213 LRYGI PLALSSLAYWGLASADRLFLKKYAGLEQLGVYSMGI SFGGAALLLQSI FSTVW 270 

L Y +PL SS+ +W L ++ R F+ + G G+ ++ + +IF+ W 

Sbjct: 210 LYYALPLIPSSILWWLLNASSRYFVLFFLGAGANGLLAVATKI PS 1 1 S I FNTI FTQAW 267 



15 



20 



25 



Identities « 15/57 (26%), Positives = 31/57 (54%) 

Query: 7 LGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQAYVR 63 

L + G++GS +L +++PL ++ + G L QT A L + ++ + + A +R 

Sbjct: 12 LVFT IGNLGSKLLVFLLVPLYTYAMT PQEYGMADLYQTTANLLLPLITMNVFDATLR 68 

Identities = 16/96 (16%), Positives = 36/96 (37%) 

Query: 307 IFSPLASLLLPENYAAVRFTWSCMLPPLFYTLTEISGIGLNWRKTRPIXXXXXXXXXX 366 

+ P+ ++ +YA+ V ML LF + ++ G ++T+ + 

Sbjct: 305 VLKPIVEKWSSDYASSWQYVPFFMLSMLFSSFSDFFGTNYIAAKQTKGVFMTSIYGTIV 364 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF10 shows 95.4% identity over a 475aa overlap with an ORF (ORFlOa) from strain A ofN. 



meningitidis: 



30 



35 



40 



45 



50 



55 



orf 10. pep 
orf 10a 

orf 10. pep 
orflOa 

orf 10. pep 
orf 10a 

orf 10 .pep 
orflOa 

orf 10. pep 
orflOa 



10 20 30 40 50 60 

MDTKEILXYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 
I I ! I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 

10 20 30 40 50 60 

70 80 90 100 110 120 

YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 
I I I I I I I : I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
YVREYYAAADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 

70 80 90 100 110 120 

130 140 150 160 170 180 

LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLXPLTVGLLHFPANTAVLTAVYALA 

I I II I 1 I II II I I I I I I I I I I I I 1 I I I I llllill I I I I I I 1 I I I II II I I I I I I I I I 
LSFLPIRFLLLVLRMEGRALAFSSAQLVSKLAILLLLPLTVGLLHFPANTAVLTAVYALA 
130 140 150 160 170 180 

190 200 210 220 230 240 

NLAAAAFLLFQNRCRLKAVRHAPFS PAVLHRGXRYG I PI AL S S I AYWGLAS ADRLFLKKY 
I I I I I I I II I I I I I I I II I I : II I I II I I I I I I I I I I I I I I I I I I I II I I I I II I I I I 
NLAAAAFLLFQNRCRLKAVRRAPFSSAVLHRGLRYGIPIALSSIAYWGLASADRLFLKKY 

190 200 210 220 230 240 

250 260 270 280 290 300 

AGLEQLGV Y SMG I S FGG AALLFQS I FSTVWT P Y I FRAIEENAPPARL SATAE S AAALLAS 
I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
AGLEQLGVYSMGI S FGGAALLFQS I FSTVWT PYI FRAIEANAPPARL SATAE S AAALLAS 
250 260 270 280 290 300 
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310 320 330 340 350 360 

orf 10 . pep ALCXTGIFSPLASLLLPENYAAVRFIWSCMXPPLFCTLAEISGIGLNWRKTRPIALAT 
III I I I I I I I I I I I I I I I I I I I I I I I I I I I III I III: III I III I I II I I II I I III 
orf 10a ALCLTGIFSPLASLLLPENYAAVRFIWSCMLPPLFCTLVEISGIGLNWRKTRPIALAT 

310 320 330 340 350 360 



370 380 390 400 410 419 

LGALAANLLLLGLDRAVPAR-PXGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHT 
INI MINIMI III: M I M I M M I M I : M M M M M I I M M M I : I I 
LGALAANLLLLGL — AVPSGGARGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHT 
370 380 390 400 410 

420 430 440 450 460 470 

orf 10 . pep LFCLTSSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKDLHKLFHYLKKQGFPLX 
I I I I : I I I I I I I I I I I I I I I II I 1 I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 10a LFCLASSAAYTCFGTPANYPLFAGVWAVYLAGCILRHRKDLHKLFHYLKKQGFPLX 
420 430 440 450 460 470 

The complete length ORFlOa nucleotide sequence <SEQ ID 377> is: 



orf 10, pep 
orf 10a 



1 


ATGGACACAA 


51 


GGTTTTAGCC 


101 


ACGACATCGG 


151 


TCGGTGTTGT 


201 


CGCCGCCGAC 


251 


TGTCTGCCGC 


301 


TCTGAAATCC 


351 


GCTGTTTGAA 


401 


GTATGGAAGG 


451 


CTCGCCATCC 


501 


GGCGAACACC 


551 


CCGCCGCCTT 


601 


CGCGCACCGT 


651 


ACCGATCGCA 


701 


GTTTGTTCCT 


751 


ATGGGTATTT 


801 


AACGGTCTGG 


851 


CCGCCCGCCT 


901 


GCCCTCTGCC 


951 


GGAAAACTAC 


1001 


CGCTGTTTTG 


1051 


CGAAAAACAC 


1101 


CCTGCTGCTG 


1151 


CGGTTGCCTG 


1201 


AGCTCCTGCC 


1251 


CACATTGTTC 


1301 


CGGCAAACTA 


1351 


TGCATCCTGC 


1401 


AAAACAAGGT 



AAGAAATCCT 
GTCATCATCC 
ACGCATCGTG 
GCCTCGGGCT 
AAAGACACTT 
CGCGATAGCC 
TGTTTTCGCT 
CTGAGCTTCC 
ACGCGCCCTT 
TGCTGCTGCT 
GCCGTCCTGA 
TTTGCTGTTT 
TTTCATCCGC 
CTAAGCAGCA 
GAAAAAATAT 
CGTTCGGCGG 
ACACCGTATA 
CTCGGCAACG 
TGACCGGCAT 
GCCGCCGTCC 
CACGCTGGTA 
GCCCGATCGC 
CTGGGGCTTG 
TGCCGCCTCA 
GCCTGTGGCA 
TGCCTGGCCT 
CCCCCTGTTT 
GCCACCGGAA 
TTCCCATTAT 



CGGCTACGCG 
TGCCGCTGCT 
CTGATGCAGA 
GGATCAGGCA 
TGTTCAAAAC 
GCCCTGCTGC 
CGACGATGCC 
TGCCCATCCG 
GCCTTTTCGT 
GCCGCTGACG 
CCGCCGTTTA 
CAAAACCGAT 
CGTCCTGCAT 
TCGCCTATTG 
GCCGGCCTAG 
AGCGGCATTA 
TTTTCCGCGC 
GCAGAATCCG 
TTTCTCGCCC 
GGTTTATCGT 
GAAATCAGCG 
GCTCGCCACC 
CCGTACCGTC 
TTTTGGCTGT 
GCCGCTCAAA 
CCTCGGCGGC 
GCCGGCGTAT 
AGATTTGCAC 
GA 



GCAGGCTCGA 
GTCGTGGTAT 
CGGCGGCGGG 
TACGTCCGCG 
CCTGTTCCTG 
TTTCCCGCCC 
GCCGCCGGCA 
CTTTCTCTTA 
CCGCGCAACT 
GTCGGGCTGC 
CGCGCTGGCA 
GCCGTCTGAA 
CGCGGCCTGC 
GGGGCTGGCA 
AACAGCTCGG 
TTGTTCCAAA 
AATCGAAGCA 
CCGCCGCCCT 
CTCGCCTCCC 
CGTATCGTGT 
GCATCGGTTT 
TTGGGCGCGC 
CGGCGGCGCG 
TTTTTGTTTT 
CGCCTGCCGC 
CTACACCTGC 
GGGCGGTATA 
AAACTGTTTC 



TCGGCAGCGC 
TTCCCTGCCG 
GCTGACGGTG 
AATACTATGC 
CCGCCGCTGC 
ATCCCTGCCG 
TCGGGCTGGT 
CTGGTTTTGC 
CGTGTCCAAG 
TGCACTTTCC 
AACCTTGCCG 
GGCCGTCCGG 
GCTACGGCAT 
TCCGCCGACC 
CGTTTATTCG 
GCATCTTTTC 
AACGCCCCGC 
GCTTGCCTCC 
TCCTGCTGCC 
ATGCTGCCTC 
GAACGTCGTC 
TGGCGGCAAA 
CGCGGCGCGG 
CAAGACCGAA 
TTTATATGCA 
TTCGGCACTC 
TCTGGCAGGC 
ATTATTTGAA 



This encodes a protein having amino acid sequence <SEQ ID 378>: 



1 MDTKEILGYA AGSIGSAVLA VIILPLLSWY FPADDIGRIV LMQTAAGLTV 

51 SVLCLGLDQA YVREYYAAAD KDTLFKTLFL PPLLSAAAIA ALLLSRPSLP 

101 SEILFSLDDA AAGIGLVLFE LSFLPIRFLL LVLRMEGRAL AFSSAQLVSK 

151 LAILLLLPLT VGLLHFPANT AVLTAVYALA NLAAAAFLLF QNRCRLKAVR 

201 RAPFSSAVLH RGLRYGIPIA LSSIAYWGLA SADRLFLKKY AGLEQLGVYS 

251 MGISFGGAAL LFQSIFSTVW TPYIFRAIEA NAPPARLSAT AESAAALLAS 

301 ALCLTGIFSP LASLLLPENY AAVRFIWSC MLPPLFCTLV EISGIGLNW 

351 RKTRPIALAT LGALAANLLL LGLAVPSGGA RGAAVACAAS FWLFFVFKTE 

401 SSCRLWQPLK RLPLYMHTLF CLASSAAYTC FGTPANYPLF AGVWAVYLAG 

451 CILRHRKDLH KLFHYLKKQG FPL* 

ORFlOa and ORF10-1 show 95.4% identity in 475 aa overlap: 



10 20 30 40 50 .60 

orf 10-1. pep MDTKEILXYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I | | | I I I I 
orf 10a MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 

10 20 30 40 50 60 



WO 99/24578 



-240- 



PCT/IB98/01665 



70 80 90 100 110 120 

or f 10-1 . pep YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 
I I I I I I I : I I I I I I I I I I I I I I I I I I I I i I I I j I I I I I I I I II I I I I I I I I I I I I I I I I I 
orflOa YVREYYAAADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 10-1 . pep LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLXPLTVGLLHFPANTAVLTAVYALA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
orf 10a LSFLPIRFLLLVLRMEGRALAFSSAQLVSKLAILLLLPLTVGLLHFPANTAVLTAVYALA 

130 140 150 160 170 180 

190 200 210 220 230 240 

or f 10-1 . pep NLAAAAFLLFQNRCRLKAVRHAPFS PAVLHRGXRYG I PI ALS S I AYWGLASADRLFLKKY 
I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I 
orf 10a NLAAAAFLLFQNRCRLKAVRRAP FS S AVLHRGLRYG I P I AL S S I AYWGLASADRLFLKKY 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 10-1 . pep AG LEQLGVYSMG I S FGGAALLFQS I FS TVWT P Y I FRAIEENAP PARLS ATAE S AAALLAS 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 10a AGLEQLGVYSMGISFGGAALLFQS I FSTVWTPYI FRAIEANAPPARLSATAESAAALLAS 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 10-1. pep ALCXTGIFSPLASLLLPENYAAVRFIWSCMXPPLFCTLAEISGIGLNWRKTRPIALAT 
III I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I : I I I I I I I I M I I I I I I I I I I 
orf 10a ALCLTGIFSPLASLLLPENYAAVRFIWSCMLPPLFCTLVEISGIGLNWRKTRPIALAT 

310 320 330 340 350 360 

370 380 390 400 410 419 

or f 10-1 . pep LGALAANLLLLGLDRAVPAR-PXGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHT 
I I I I I I I I I I I I I Ml: II I I I I I I I I I I II : I I I I II II I I I II I I I I II : I I 
orf 10a LGALAANLLLLGL — AVPSGGARGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHT 

370 380 390 400 410 

420 430 440 450 460 470 

orf 10-1 . pep LFCLTSSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKDLHKLFHYLKKQGFPLX 
I I I I : I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I M I I I I I I 
orf 10a LFCLASSAAYTCFGTPANYPLFAGVWAVYLAGCILRHRKDLHKLFHYLKKQGFPLX 
420 430 440 450 460 470 

Homology with a predicted ORF from N. gonorrhoeae 

ORF10 shows 94.1% identity over a 475aa overlap with a predicted ORF (ORFlO.ng) from N. 
gonorrhoeae: 



orf lOng.pep 



orf lOnm 



orf lOng.pep 
orflOnm 
orf lOng . pep 



orf lOnm 



orf lOng.pep 
orflOnm 



MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 60 

I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 1 I I I I I 
MDTKEILXYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 60 

YVREYYAAADKDTLFKTLFLPPLLFSAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 120 
I I I I I I I : I I I I I I II I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 120 

LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLLPLTVGLLHFPANTSVLTAVYALA 180 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I : I I I I I I I I I 
LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLXPLTVGLLHFPANTAVLTAVYALA 180 

NLAAAAFLLFQNRCRLKAVRRAPFSPAVLHRGLRYGIPLALSSLAYWGLASADRLFLKKY 240 
I I I I I I I I I I I I I I I II I I I: I I I II I I I I I I I I I I I : I I I I : I I II I I I I I I I I II I I 
NLAAAAFLLFQNRCRLKAVRHAPFS PAVLHRGXRYGI PI AL SSI AYWGLASADRLFLKKY 240 



orf lOng .pep AGLEQLGVYSMGISFGGAALLLQS I FSTVWTPYI FRAI EEN AT PARLS AT AES AAALLAS 300 

I I I II I I I I I I I II I I I I I I I : I I I I I I I II I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I 
orflOnm AGLEQLG V YSMGIS FGGAALLFQS I FSTVWTPYI FRAIEENAP PARLS ATAE S AAALLAS 300 



orf lOng.pep ALCLTG I FS PLASLLLPEN YAAVRFT WSCMLPPLFYTLTE I SGIGLNVVRKTRPIALAT 360 

1 I I I II I I 1 I I I I I I I I I I II I I I Mill I I I I I I : I I I I I I I I I I I I I I I I I I I I 
orflOnm ALCXTGIFSPLASLLLPENYAAVRFIWSCMXPPLFCTLAEI SGIGLNVVRKTRPIALAT 360 
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370 380 390 400 410 

orf lOng . pep LGALAANLLLLGL— AVPSGGTRGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHT 
I I I I I I I I I I I I I III: I I I I I I I I I I I I I I : I I I I I I I I I I I I I I II I I I : I I 
orflOnm LGALAANLLLLGLDRAVPAR- PXGAAVACAAS FWL FFAFKTE S S CRLWQPLKRLPLYLHT 

370 380 390 400 410 



420 430 440 450 460 470 

orf lOng . pep LFCLASSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKNLHKLFHYLKKQGFPLX 
I I I I : I I I I I I I 1 I I I I I I I I I I I I I I I I I II I I I I I I I : I I I II I I I I I I I I I I I 
orflOnm LFCLTSSAAYTCFGTPANYPLFAGWAAYLAGCILRHRKDLHKLFHYLKKQGFPLX 
420 430 440 450 460 470 

The complete length ORFlOng nucleotide sequence <SEQ ID 379> is: 



1 


ATGGACACAA 


51 


GGTTTTAGCC 


101 


ACGACATCGG 


151 


TCGGTATTGT 


201 


CGCCGCCGAC 


251 


TGTTTTCCGC 


301 


TCTGAAATCC 


351 


GCTGTTTGAA 


401 


GTATGGAAGG 


451 


CTCGCCATTC 


501 


GGCGAACACC 


551 


CCGCCGCCTT 


601 


CGCGCGCCGT 


651 


ACCGCTCGCA 


701 


GTTTGTTCCT 


751 


ATGGGTATTT 


801 


AACGGTCTGG 


851 


CCGCCCGCCT 


901 


GCCCTCTGCC 


951 


GGAAAACTAC 


1001 


cgctGTTTTA 


1051 


CGCAAAACGC 


1101 


CCTGCTGCTG 


1151 


CGGTTGCCTG 


1201 


AGCTCCTGCC 


1251 


CACATTGTTC 


1301 


CGGCAAACTA 


1351 


TGCATCCTGC 


1401 


AAAACAAGGT 



AAGAAATCCT 
GTCATCATCC 
GCGCATCGTG 
GCCTCGGGCT 
AAAGACACTT 
CGCGATAGCC 
TGTTTTCGCT 
CTGAGCTTCC 
GCGCGCCCTT 
TGCTGCTGTT 
TCCGTCCTGA 
TTTGCTGTTT 
TTTCGCCCGC 
CTGAGCAGCC 
GAAAAAATAT 
CGTTCGGCGG 
ACACCGTATA 
CTCGGCAACG 
TGACCGGAAT 
GCCGCCGTCC 
CACGCTGACC 
GTCCGATCGC 
CTGGGGCTTG 
TGCCGCCTCA 
GCCTGTGGCA 
TGCCTgGCCT 
CCCcctgttt 
GCCACCGGAA 
TTCCCATTAT 



CGGCTACGCG 
TGCCGCTGCT 
CTGATGCAGA 
GGATCAGGCA 
TGTTCAAAAC 
GCCCTGCTGC 
CGACGATGCC 
TGCCCATCCG 
GCCTTTTCGT 
GCCGCTGACG 
CCGCCGTTTA 
CAAAACCGAT 
CGTCCTGCAC 
TTGCCTATTG 
GCGGGCCTGG 
GGCGGCATTA 
TTTTCCGTGC 
GCAGAATCCG 
TTTCTCGCCC 
GGTTTACCGT 
GAAATCAGCG 
GCTTGCCACC 
CCGTACCGTC 
TTCTGGTTGT 
GCCGCTCAAA 
CCTCGGCGGC 
gccggcgtAT 
AAATTTGCAC 
GA 



GCAGGCTCGA 
GTCGTGGTAT 
CGGCGGCGGG 
TACGTCCGCG 
CCTGTTCCTG 
TTTCCCGCCC 
GCCGCCGGCA 
CTTTCTCTTA 
CCGCGCAACT 
GTCGGGCTGC 
CGCGCTGGCA 
GCCGTCTGAA 
CGGGGGCTGC 
GGGGCTGGCA 
AACAGCTCGG 
TTGCTCCAAA 
AATCGAAGAA 
CCGCCGCCCT 
CTCGCCTCCC 
CGTATCGTGT 
GCATCGGTTT 
TTGGGCGCGC 
CGGCGGCACG 
TTTTTGTTTT 
CGCCTGCCGC 
CTACACCTGC 
GGGCGGCATA 
AAACTGTTTC 



TCGGCAGCGC 
TTCcccgCCG 
ACTGACGGTG 
AATACTATGC 
CCGCCGCTGC 
GTCCCTGCCG 
TCGGGCTGGT 
CTGGTTTTGC 
CGTGCCCAAA 
TGCACTTTCC 
AACCTTGCCG 
GGCCGTCCGG 
GCTACGGCAT 
TCCGCCGACC 
CGTTTATTCG 
GCATCTTTTC 
AACGCCACGC 
GCTTGCCTCC 
TCCTGCTGCC 
ATGCTGccgc 
GAACGTCGTC 
TGGCGGCAAA 
CGCGGCGCGG 
CAAGACAGAA 
TTTATATGCA 
TTCGGCACAC 
TCTGGCAGGC 
ATTATTTGAA 



This encodes a protein having amino acid sequence <SEQ ID 380>: 



1 MDTKEILGYA AGSIGSAVLA VIILPLLSWY FPA DDIGRIV LMQTAAGLTV 

51 SVLCLGLDQA YVREYYAAAD KDTLFKTL FL PPLLFSAAIA ALLL SRPSLP 

101 SEILFSLDDA AAGIGLVLFE LSFLPIRFLL LVLRMEGRAL AFSSAQLVPK 

151 LAIL LLLPLT VGLLHFPANT SVLTAVYALA NLAAAAFLLF QNRCRLKAVR 

201 RAPFSPAVLH RGLRYGIPLA LSSLAYWGLA SADRLFLKKY AGLEQLGVYS 

251 MGISFGGAAL LLQSIFSTVW TPYIFRAIEE NATPARLSAT AESAAALLAS 

301 ALCLTGIFSP LASLLLPENY AAVRFTWSC MLPPLFYTLT EISGIGLNW 

351 RKTRPI ALAT LGALAANLLL LGLA VPSGGT RGAAVACAAS FWLFFVFKTE 

401 SSCRLWQPLK RLPLYMHTLF CLASSAAYTC FGTPANYPLF AGVWAAYLAG 

451 CILRHRKNLH KLFHYLKKQG FPL* 

ORFlOng and ORF10-1 show 96.4% identity in 473 aa overlap: 



10 20 30 40 50 60 

orf 10-1 . pep MDTKE ILGYAAGS IGSAVLAVI ILPLLSWYFPADDIGRI VLMQTAAGLTVSVLCLGLDQA 
III I M M IMIIII M MUM M I II II M Ml I II I ! II I I II Ml I II III llil I 
orflOng-l MDTKE ILGYAAGS IGSAVLAVI ILPLLSWYFPADDIGRI VLMQTAAGLTVSVLCLGLDQA 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 10-1 . pep YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 
I II I I II : I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I 
orfl0ng-l YVREYYAAADKDTLFKTLFLPPLLFSAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 

70 80 90 100 110 120 



WO 99/24578 



-242- 
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130 140 150 160 170 180 

orf 10-1 . pep LSFLPIRFLLLVLRMEGRAIAFSSAQLVPKLAILLLLPLTVGLLHFPANTAVLTAVYALA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I 
orflOng-1 LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLLPLTVGLLHFPANTSVLTAVYALA 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 10-1 . pep NLAAAAFLL FQNRCRLKAVRHAPFS PAVLHRGLRYG I P I ALS S I AYWGLAS ADRLFLKKY 
I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I : I i I I : I I I I I I I I I I I I I I I I 
orflOng-l NLAAAAFLL FQNRCRLKAVRRAPFS PAVLHRGLRYG I PLALS S LAYWG LAS ADRLFLKKY 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 10-1 . pep AGLEQLGVYSMGISFGGAALLFQSIFSTVWTPYIFRAIEENAPPARLSATAESAAALLAS 
I I I I I I I I I I I I I I I I I I I I I : I I I I I I II I II I I I I I I I I I I I I I I I 1 I I I I I I I I I I 
orflOng-1 AGLEQLGVYSMGISFGGAALLLQSIFSTVWTPYIFRAIEENATPARLSATAESAAALLAS 

250 260 270 280 290 300 



310 320 330 340 350 360 

orf 10-1. pep ALCLTGIFSPLASLLLPENYAAVRFIWSCMLPPLFCTLAEISGIGLNWRKTRPIALAT 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I II I II II I I II 
orflOng-1 ALCLTGIFSPLASLLLPENYAAVRFTWSCMLPPLFYTLTEISGIGLNWRKTRPIALAT 

310 320 330 340 350 360 



370 380 390 400 410 420 

orf 10-1 . pep LGALAANLLLLGLAVPSGGARGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHTLF 
I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I : I II I I I I I I I I I I I I I I I I : I I I I 
orflOng-1 LGALAANLLLLGLAVPSGGTRGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHTLF 

370 380 390 400 410 420 



430 440 450 460 470 

orf 10-1 . pep CLT S S AAYTC FGT PANYPLFAGVWAAYLAGC I LRHRKDLHKLFH YLKKQG FPLX 
I I: II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I : I I I I I I I I I I I II I I I 
orflOng-1 CLASSAAYTCFGT PANYPLFAGVWAAYLAGC I LRHRKNLHKLFHYLKKQGFPLX 

430 440 450 460 470 



Based on this analysis, including the presence of a putative leader peptide and several 
transmembrane segments and the presence of a leucine-zipper motif (4 Leu residues spaced by 6 
aa, shown in bold), it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 45 



The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 381>: 

1 . . ATCCTGAAAC CGCATAACCA GCTTAAGGAA GACATCCAAC CTGATCCGGC 

51 CGATCAAAAC GCCTTGTCCG AACCGGATGC TGCGACAGAG GCAGAGCAGT 

101 CGGATGCGGA AAATGCTGCC GACAAGCAGC CCGTTGCCGA TAAAGCCGAC 

151 GAGGTTGAAG AAAAGGCGGG CGAGCCGGAA CGGGAAGAGC CGGACGGACA 

201 GGCAGTGCGT AAGAAAGCGC TGACGGAAGA GCGTGAACAA ACCGTCAGGG 

251 AAAAAGCGCA GAAGAAAGAT GCCGAAACGG TTAAAATACA AGCGGTAAAA 

301 CCGTCTAAAG AAACAGAGAA AAAAGCTTCA AAAGAAGAGA AAAAGGCGGC 

351 GAAGGAAAAA GTTGCACCCA AACCAACCCC GGAACAAATC CTCAACAGCG 

401 GCAgCATCGA AAAmGCGCGC AgTGCCGCCG CCAAAGAAGT GCAGAAAATG 

451 AA. AACGTCC GACAAGGCGG AAGC.AACGC ATTATCTGCA AATGGGCGCG 

501 TATGCCGACC GTCAGAGCGC GGAAGGGCAG CGTGCCAAAC TGGCAATCTT 

551 GGGCATATCT TCCAAGGTGG TCGGTTATCA GGCGGGACAT AAAACGCTTT 

601 ACCGGGTGCA AAGCGGCAAT ATGTCTGCCG ATGCGGTGA 

This corresponds to the amino acid sequence <SEQ ID 382; ORF65>: 



1 . . ILKPHNQLKE DIQPDPADQN ALSEPDAATE AEQSDAENAA DKQPVADKAD 
51 EVEEKAGEPE REEPDGQAVR KKALTEEREQ TVREKAQKKD AETVKIQAVK 
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101 PSKETEKKAS KEEKKAAKEK VAPKPTPEQI LNSGSIEXAR SAAAKEVQKM 
151 XNVRQGGSXR IICKWARMPT VRARKGSVPN WQSWAYLPRW SVIRRDIKRF 
201 TGCKAAICLP MR* 

Further work revealed the complete nucleotide sequence <SEQ ID 383>: 



1 ATGTTTATGA ACAAATTTTC CCAATCCGGA AAAGGTCTGT CCGGTTTTTT 

51 CTTCGGTTTG ATACTGGCGA CGGTCATTAT TGCCGGTATT TTGTTTTATC 

101 TGAACCAGAG CGGTCAAAAT GCGTTCAAAA TCCCGGCTTC GTCGAAGCAG 

151 CCTGCAGAAA CGGAAATCCT GAAACCGAAA AACCAGCCTA AGGAAGACAT 

201 CCAACCTGAA CCGGCCGATC AAAACGCCTT GTCCGAACCG GATGCTGCGA 

251 CAGAGGCAGA GCAGTCGGAT GCGGAAAAAG CTGCCGACAA GCAGCCCGTT 

301 GCCGATAAAG CCGACGAGGT TGAAGAAAAG GCGGGCGAGC CGGAACGGGA 

351 AGAGCCGGAC GGACAGGCAG TGCGTAAGAA AGCGCTGACG GAAGAGCGTG 

401 AACAAACCGT CAGGGAAAAA GCGCAGAAGA AAGATGCCGA AACGGTTAAA 

451 AAACAAGCGG TAAAACCGTC TAAAGAAACA GAGAAAAAAG CTTCAAAAGA 

501 AGAGAAAAAG GCGGCGAAGG AAAAAGTTGC ACCCAAACCA ACCCCGGAAC 

551 AAATCCTCAA CAGCGGCAGC ATCGAAAAAG CGCGCAGTGC CGCCGCCAAA 

601 GAAGTGCAGA AAATGAAAAC GTCCGACAAG GCGGAAGCAA CGCATTATCT 

651 GCAAATGGGC GCGTATGCCG ACCGTCAGAG CGCGGAAGGG CAGCGTGCCA 

701 AACTGGCAAT CTTGGGCATA TCTTCCAAGG TGGTCGGTTA TCAGGCGGGA 

751 CATAAAACGC TTTACCGGGT GCAAAGCGGC AATATGTCTG CCGATGCGGT 

801 GAAAAAAATG CAGGACGAGT TGAAAAAACA TGAAGTCGCC AGCCTGATCC 

851 GTTCTATCGA AAGCAAATAA 

This corresponds to the amino acid sequence <SEQ ID 384; ORF65-l>: 



1 MFMNKFSQSG KGLSG FFFGL ILATVIIAGI LF YLNQSGQN AFKIPASSKQ 

51 PAETEILKPK NQPKEDIQPE PADQNALSEP DAATEAEQSD AEKAADKQPV 

101 ADKADEVEEK AGEPEREEPD GQAVRKKALT EEREQTVREK AQKKDAETVK 

151 KQAVKPSKET EKKASKEEKK AAKEKVAPKP TPEQILNSGS IEKARSAAAK 

201 EVQKMKTSDK AEATHYLQMG AYADRQSAEG QRAKLAILGI SSKWGYQAG 

251 HKTLYRVQSG NMSADAVKKM QDELKKHEVA SLIRSIESK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF65 shows 92.0% identity over a 150aa overlap with an ORF (ORF65a) from strain A of N. 
meningitidis: 



10 20 30 

orf 65 . pep ILKPHNQLKEDIQPDPADQNALSEPDAATE 

IIM:H I I I I I I : I I I I 1 I I I I II I I I 
orf 65a IIAGILF YLNQSGQNAFKIPVPSKQPAETEILKPKNQPKEDIQPEPADQNALSEPDAAKE 
30 40 50 60 70 80 



40 50 60 70 80 90 

orf 65 . pep AEQS DAENAADKQPVADKADEVEEKAGE PEREEPDGQAVRKKALTEEREQTVREKAQKKD 
I I I I I I I: I I I I I I I I I I I I I I I I I I Mill: I I I I I I I I I I I I I I I 11 I INI III 
o r f 6 5 a AEQS DAEKAADKQPVADKADE VEEKADE PEREKS DGQAVRKKALTEEREQT VGEKAQKKD 

90 100 110 120 130 140 



100 110 120 130 140 150 

orf 65 . pep AETVKIQAVKPSKETEKKASKEEKKAAKEKVAPKPTPEQ I LNSGSIEXAR SAAAKEVQKM 
MM! I I I I I I I I 1 I I I I I I I I I I I I I ! I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I 
orf 65a AETVKKQAVKPSKETEKKASKEEKKAEKEKVAPKPTPEQILNSGSIEKARSAAAKEVQKM 
150 160 170 180 190 200 

160 170 180 190 200 210 

orf 65. pep XNVRQGGSXRIICKWARMPTVRARKGSVPNWQSWAYLPRWSVIRRDIKRFTGCKAAICLP 

orf 65a KTPDKAEATHYLQMGAYADRRSAEGQRAKLAILGISSKWGYQAGHKTLYRVQSGNMSAD 
210 220 230 240 250 260 

The complete length ORF65a nucleotide sequence <SEQ ID 385> is: 



1 ATGTTTATGA ACAAATTTTC CCAATCCGGA AAAGGTCTGT CCGGTTTTTT 
51 CTTCGGTTTG ATACTGGCGA CGGTCATTAT TGCCGGTATT TTGTTTTATC 
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101 TGAACCAGAG CGGTCAAAAT GCGTTCAAAA TCCCGGTTCC GTCGAAGCAG 

151 CCTGCAGAAA CGGAAATCCT GAAACCGAAA AACCAGCCTA AGGAAGACAT 

201 CCAACCTGAA CCGGCCGATC AAAACGCCTT GTCCGAACCG GATGCTGCGA 

251 AAGAGGCAGA GCAGTCGGAT GCGGAAAAAG CTGCCGACAA GCAGCCCGTT 

301 GCCGACAAAG CCGACGAGGT TGAGGAAAAG GCGGACGAGC CGGAGCGGGA 

351 AAAGTCGGAC GGACAGGCAG TGCGCAAGAA AGCACTGACG GAAGAGCGTG 

401 AACAAACCGT CGGGGAAAAA GCGCAGAAGA AAGATGCCGA AACGGTTAAA 

451 AAACAAGCGG TAAAACCATC TAAAGAAACA GAGAAAAAAG CTTCAAAAGA 

501 AGAGAAAAAG GCGGAGAAGG AAAAAGTTGC ACCCAAACCG ACCCCGGAAC 

551 AAATCCTCAA CAGCGGCAGC ATCGAAAAAG CGCGCAGTGC CGCTGCCAAA 

601 GAAGTGCAGA AAATGAAAAC GCCCGACAAG GCGGAAGCAA CGCATTATCT 

651 GCAAATGGGC GCGTATGCCG ACCGCCGGAG CGCGGAAGGG CAGCGTGCCA 

701 AACTGGCAAT CTTGGGCATA TCTTCCAAGG TGGTCGGTTA TCAGGCGGGA 

751 CATAAAACGC TTTACCGGGT GCAAAGCGGC AATATGTCTG CCGATGCGGT 

801 GAAAAAAATG CAGGACGAGT TGAAAAAACA TGAAGTCGCC AGCCTGATCC 

851 GTTCTATCGA AAGCAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 386>: 



1 MFMNKFSQSG KGLSG FFFGL ILATVIIAGI LF YLNQSGQN AFKIPVPSKQ 

51 PAETEILKPK NQPKEDIQPE PADQNALSEP DAAKEAEQSD AEKAADKQPV 

101 ADKADEVEEK ADEPEREKSD GQAVRKKALT EEREQTVGEK AQKKDAETVK 

151 KQAVKPSKET EKKASKEEKK AEKEKVAPKP TPEQILNSGS IEKARSAAAK 

201 EVQKMKTPDK AEATHYLQMG AYADRRSAEG QRAKLAILGI SSKWGYQAG 

251 HKTLYRVQSG NMSADAVKKM QDELKKHEVA SLIRSIESK* 

ORF65a and ORF65-1 show 96.5% identity in 289 aa overlap: 



10 20 30 40 50 60 

orf 65a . pep MFMNKFSQSGKGLSGFFFGLILATVIIAGILFYLNQSGQNAFKIPVPSKQPAETEILKPK 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I 
orf 65-1 MFMNKFSQSGKGLSGFFFGLILATVIIAGILFYLNQSGQNAFKIPASSKQPAETEILKPK 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 65a . pep NQPKEDIQPEPADQNALSEPDAAKEAEQSDAEKAADKQPVADKADEVEEKADEPEREKSD 
I M II I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Mill: I 
orf 65-1 NQPKEDIQPE PADQNALSEPDAATEAEQS DAE KAADKQPVADKADEVEEKAGEPEREEPD 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 65a . pep GQAVRKKALTEEREQTVGEKAQKKDAETVKKQAVKPSKETEKKASKEEKKAEKEKVAPKP 
I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I II I 
orf 65-1 GQAVRKKALTEEREQTVREKAQKKDAETVKKQAVKPSKETEKKASKEEKKAAKEKVAPKP 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 65a . pep TPEQILNSGSIEKARSAAAKEVQKMKTPDKAEATHYLQMGAYADRRSAEGQRAKLAILGI 
I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I 
orf 65-1 T PEQ I LN SGS IEKARS AAAKEVQKMKTS DKAEATH YLQMGAYADRQS AEGQRAKLAI LG I 

190 200 210 220 230 240 



250 260 270 280 290 

orf 65a . pep SSKWGYQAGHKTLYRVQSGNMSADAVKKMQDELKKHEVASLIRSIESKX 
II I I I II II I I I I I I I I I I I 111 I I I I I I II I i I I I I I I I I I M I I I I I I 
orf 65-1 SSKWGYQAGHKTLYRVQSGNMSADAVKKMQDELKKHEVASLIRSIESKX 

250 260 270 280 290 



Homology with a predicted ORF from N. gonorrhoeae 

ORF65 shows 89.6% identity over a 212aa overlap with a predicted ORF (ORF65.ng) from N. 
gonorrhoeae: 



30 40 50 60 70 80 

ORF65ng IIAGILLYLNQGGQNAFKIPAPSKQPAETEILKLKNQPKEDIQPEPADQNALSEPDVAKE 

III : I I I I I I I I : I I I II I I I I I I : I I 
ORF65 ILKPHNQLKEDIQPDPADQNALSEPDAATE 

10 20 30 
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90 100 110 120 130 140 

ORF65ng AEQSDAEKAADKQPVADKADEVEEKAGEPEREEPDGQAVRKKALTEEREQTVREKAQKKD 

I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I 
ORF65 AEQSDAENAADKQPVADKADEVEEKAGEPEREEPDGQAVRKKALTEEREQTVREKAQKKD 
5 40 50 60 70 80 90 

150 160 170 180 190 200 

ORF65ng AETVKKKAVKPSKETEKKASKEEKKAAKEKVAPKPTPEQILNSRSIEKARSAAAKEVQKM 

I 1 I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I 
10 ORF65 AETVKIQAVKPSKETEKKASKEEKKAAKEKVAPKPTPEQILNSGSIEXARSAAAKEVQKM 

100 110 120 130 140 150 

210 220 230 240 250 260 

ORF65ng KNFGQGGSQRIICKWARMPNPGARKGSVPNWQSWAYLPKWSAIRRDIKRFTACKAAICPP 
15 | MM MINIUM: I I I I I I I I I I I I I I I I : I I : I I I I I I I I I : i I I I I I I 

ORF65 XNVRQGGSXRIICKWARMPTVRARKGSVPNWQSWAYLPRWSVIRRDIKRFTGCKAAICLP 
160 170 180 190 200 210 

ORF65ng MR 
20 M 
ORF65 MR 

An ORF65ng nucleotide sequence <SEQ ID 387> was predicted to encode a protein having amino 
acid sequence <SEQ ID 388>: 

1 MFMNKFSQSG K GLSGFFFGL ILATVIIAGI LLYLNQGGQN AFKIPAPSKQ 

25 51 PAETEILKLK NQPKEDIQPE PADQNALSEP DVAKEAEQSD AEKAADKQPV 

101 ADKADEVEEK AGEPEREEPD GQAVRKKALT EEREQTVREK AQKKDAETVK 

151 KKAVKPSKET EKKASKEEKK AAKEKVAPKP TPEQILNSRS IEKARSAAAK 

201 EVQKMKNFGQ GGSQRIICKW ARMPNPGARK GSVPNWQSWA YLPKWSAIRR 

251 DIKRFTACKA AICPPMR* 

30 After further analysis, the complete gonococcal DNA sequence <SEQ ID 389> was found to be: 

1 ATGTTTATGA ACAAATTTTC CCAATCCGGA AAAGGTCTGT CCGGTTTCTT 

51 CTTCGGTTTG ATACTGGCAA CGGTCATTAT TGCCGGTATT TTGCTTTATC 

101 TGAACCAGGG CGGTCAAAAT GCGTTCAAAA TCCCGGCTCC GTCGAAGCAG 

151 CCTGCAGAAA CGGAAATCCT GAAACTGAAA AACCAGCCTA AGGAAGACAT 

35 201 CCAACCTGAA CCGGCCGATC AAAACGCCTT GTCCGAACCG GATGTTGCGA 

251 AAGAGGCAGA GCAGTCGGAT GCGGAAAAAG CTGCGGACAA GCAGCCCGTT 

301 GCCGACAAag ccgacgAGGT TGAAGAAAag GcGGgcgAgc cggaACGGga 

351 aGAGCCGGAC ggACAGGCAG TGCGCAAGAA AGCACTGAcg gAAGAgcGTG 

401 AACAAACcgt cagggAAAAA GCGCagaaga AAGATGCCGA AACGgTTAAA 

40 451 AAacaaGCgg tAaaaccgtc tAAAGAAACa gagaaaaaag cTtcaaaaga 

501 agagaaaaag gcggcgaaag aaaAAGttgc acccaaaccg accccggaaC 

551 aaatcctcaa cagccgCagc atcgaaaaag cgcgtagtgc cgctgccaaa 

601 gaAgtgcaGA AAatgaaaaa ctTtgggcaa ggcgGaagcc aacgcattaT 

651 CTGcaaatgg gcgcgtatgc cgaccgtccg gagcgcggaA gggcagcgtg 

45 701 ccaaACtggc aAtcttgGgc atatctTccg aagtggtcgG CTATCAGGCG 

751 GGACATAAAA CGCTTTACCG CGTGCAAagc GGCAatatgt ccgccgatgc 

801 gGTGAAAAAA ATGCAGGACG AGTTGAAAAA GCATGGGGtt gcCAGCCTGA 

851 TCCGTGcgAT TGAAGGCAAA TAA 

This encodes the following amino acid sequence <SEQ ID 390>: 

50 1 MFMNKFSQSG KGLSG FFFGL ILATVIIAGI LL YLNQGGQN AFKIPAPSKQ 

51 PAETEILKLK NQPKEDIQPE PADQNALSEP DVAKEAEQSD AEKAADKQPV 

101 ADKADEVEEK AGEPEREEPD GQAVRKKALT EEREQTVREK AQKKDAETVK 

151 KQAVKPSKET EKKASKEEKK AAKEKVAPKP TPEQILNSRS IEKARSAAAK 

201 EVQKMKNFGQ GGSQRIICKW ARMPTVRSAE GQRAKLAILG ISSEWGYQA 

55 251 GHKTLYRVQS GNMSADAVKK MQDELKKHGV ASLIRAIEGK * 

ORF65ng-l and ORF65-1 show 89.0% identity in 290 aa overlap: 

10 20 30 40 50 60 

orf 65-1. pep MFMNKFSQSGKGLSGFFFGLILATVI IAGILFYLNQSGQNAFKI PASSKQPAETEILKPK 
I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I : I I I I : I I I I I I I I I I I I I I 1 I I I I I I 
60 orf65ng-l MFMNKFSQSGKGLSGFFFGLILATVI IAGILLYLNQGGQNAFKIPAPSKQPAETEILKLK 

10 20 30 40 50 60 
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70 80 90 100 110 120 

orf 65-1 . pep NQPKEDIQPEPADQNALSEPDAATEAEQSDAEKAADKQPVADKADEVEEKAGEPEREEPD 
I I I I I I i I I I I I I I I I I I I II : I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I 
orf65ng-l NQPKE DIQPE PADQN ALSE PDVAKEAEQS DAEKAADKQPVADKADE VEEKAGE PEREE PD 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 65-1 . pep GQAVRKKALTEEREQTVREKAQKKDAETVKKQAVKPSKETEKKASKEEKKAAKEKVAPKP 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I M I I I I I I I I 1 1 I I I I I I I I 1 1 I I I I I 
orf65ng-l GQAVRKKALTEEREQTVREKAQKKDAETVKKQAVKPSKETEKKASKEEKKAAKEKVAPKP 

130 140 150 160 170 180 

190 200 210 220 230 239 

orf 65-1 . pep TPEQILNSGSIEKARSAAAKEVQKMKTSDKAEATHYL-QMGAYADRQSAEGQRAKLAILG 
illlllll I MM MM I MM III: :::::::: : I I I I I I I I I I I I I 
orf65ng-l TPEQILNSRSIEKARSAAAKEVQKMKNFGQGGSQRIICKWARMPTVRSAEGQRAKLAILG 

190 200 210 220 230 240 

240 250 260 270 280 290 

orf 65-1 . pep ISSKWGYQAGHKTLYRVQSGNMSADAVKKMQDELKKHEVASLIRSIESKX 

II I : I I I I II I I I I I I I I I I I I I I II M I I M I I I I I I llllihlhll 
orf65ng-l ISSEWGYQAGHKTLYRVQSGNMSADAVKKMQDELKKHGVASLIRAIEGKX 

250 260 270 280 290 



On this basis, including the presence of a putative transmembrane domain in the gonococcal 
protein, it is predicted that the proteins from N. meningitidis and K gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 46 

The following DNA sequence, believed to be complete, was identified in N.meningitidis <SEQ ID 
391>: 



1 ATGAACCACG ACATCACTTT CCTCACCCTG TTCCTACTCG GTkTCTTCGG 

51 CGGAAcGCAC TGCATCGGTA TGTGCGGCGG ATTAAGCAGC GcGTTTGs • s 

101 TCCAACTCCC CCCGCATATC AACCGCTTTT GGCTGATCCT GCTGCTTAAC 

151 ACAGGACGGG TAAGCAGCTA TACGGCAAtC GGCCTGATAC TCGGATTAAT 

201 CGGACAGGTC GGCGTTTCAC TCGAcCAaAC CCGCGTCCTG CAGAATATTT 

251 TATACACGGC CGCCAACCTC CTGCTGCTCT TTTTAGGCTT ATACTTGAGC 

301 GGTATTTCTT CCTTGGCGGC AAAAATCGAG AAaATCGGCA AACCGATATG 

351 GCGGAACCTG AACCCGATAC TCAACCGGCT GTTACCCATA AAATCCATAC 

401 CCGCCTGCCT tGCGgTCGGA ATATTATGGG GCTGGCTGCC GTGCGGACTG 

451 GTTTACAGCG CGTCGCTTTA CGCGCTGGGA AgCGGTAGTG CGGCAACGGG 

501 CGGGTTATAT ATGCTTGCCT TTGCACTGGG TACGCTGCCC AATCTTtTAG 

551 CAATCGGCAT TTTtTCCCTG CAACTGAAwA AAATCATGCA AAACCGATAT 

601 ATCCGCCTGT GTACGGGATT ATCCGTATCA TTATGGGCAT TATGGAAACT 

651 TGCCGTCCTG TGGCTGTAA 

This corresponds to the amino acid sequence <SEQ ID 392; ORF103>: 

1 MNHDITFLTL FLLGXFGGTH CIGMCGGLSS AFXXQLPPHI NRFWLILLLN 

51 TGRVSSYTAI GLILGLIGQV GVSLDQTRVL QNILYTAANL LLLFLGLYLS 

101 GISSLAAKIE KIGKPIWRNL NPILNRLLPI KSIPACLAVG ILWGWLPCGL 

151 VYSASLYALG SGSAATGGLY MLAFALGTLP NLLAIGIFSL QLXKIMQNRY 

201 IRLCTGLSVS LWALWKLAVL WL* 

Further work elaborated the DNA sequence <SEQ ID 393> as: 



1 ATGAACCACG ACATCACTTT CCTCACCCTG TTCCTACTCG GTTTCTTCGG 

51 CGGAACGCAC TGCATCGGTA TGTGCGGCGG ATTAAGCAGC GCGTTTGCGC 

101 TCCAACTCCC CCCGCATATC AACCGCTTTT GGCTGATCCT GCTGCTTAAC 

151 ACAGGACGGG TAAGCAGCTA TACGGCAATC GGCCTGATAC TCGGATTAAT 

201 CGGACAGGTC GGCGTTTCAC TCGACCAAAC CCGCGTCCTG CAGAATATTT 

251 TATACACGGC CGCCAACCTC CTGCTGCTCT TTTTAGGCTT ATACTTGAGC 

301 GGTATTTCTT CCTTGGCGGC AAAAATCGAG AAAATCGGCA AACCGATATG 
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351 GCGGAACCTG AACCCGATAC TCAACCGGCT GTTACCCATA AAATCCATAC 

401 CCGCCTGCCT TGCGGTCGGA ATATTATGGG GCTGGCTGCC GTGCGGACTG 

451 GTTTACAGCG CGTCGCTTTA CGCGCTGGGA AGCGGTAGTG CGGCAACGGG 

501 CGGGTTATAT ATGCTTGCCT TTGCACTGGG TACGCTGCCC AATCTTTTAG 

551 CAATCGGCAT TTTTTCCCTG CAACTGAAAA AAATCATGCA AAACCGATAT 

601 ATCCGCCTGT GTACGGGATT ATCCGTATCA TTATGGGCAT TATGGAAACT 

651 TGCCGTCCTG TGGCTGTAA 

This corresponds to the amino acid sequence <SEQ ID 394; ORF103-1>: 



1 MNHDITFLTL FLLGFFGGTH CIGMCGGLSS AFA LQLPPHI NRFWLILLLN 

51 TGRVSS YTAI GLILGLIGQV GVSL DQTRVL QNILYTAAN L LLLFLGLYLS 

101 GISSLA AKIE KIGKPIWRNL NPILNRLLPI KSIP ACLAVG ILWGWLPCGL 

151 VYSASLYALG SGSAATGGLY MLAFALGTLP NLLAIGIF SL QLKKIMQNRY 

201 IRLCTGLSVS LWALWKLAVL WL* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF103 shows 93.8% identity over a 222aa overlap with an ORF (ORF103a) from strain A ofN. 



meningitidis: 

10 20 30 40 50 60 

orf 103 . pep MNHDITFLTLFLLGXFGGTHCIGMCGGLSSAFXXQLPPHINRFWLILLLNTGRVSSYTAI 
II I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I 
orf 103a MNXDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRXWLILLLNTGRVSSYTAI 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 103 . pep GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 103a GLILGLIGQVGVSLDQTRVXQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 103. pep NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 
I I I I I I I I I I I I I I I I I 1 1 1 1 I I I I I I I I 1 1 ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 103a NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 

130 140 150 160 170 180 



190 200 210 220 

orf 103 . pep NLLAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 
11 1 I I I I I I I I I I I I I II I I II I I I I I I II I I I I I I I i I I 1 i 
orf 103a NLXAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 

190 200 210 220 

The complete length ORF 103a nucleotide sequence <SEQ ID 395> is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



ATGAACCANG 
CGGAACGCAC 
TCCAACTCCC 
ACAGGACGGG 
CGGACAGGTC 
TATACACGGC 
GGTATTTCTT 
GCGGAACCTG 
CCGCCTGCCT 
GTTTACAGCG 
CGGGTTATAT 
CAATCGGCAT 
ATCCGCCTGT 
TGCCGTCCTG 



ACATCACTTT 
TGCATCGGTA 
CCCGCATATC 
TAAGCAGCTA 
GGCGTTTCAC 
CGCCAACCTC 
CCTTGGCGGC 
AACCCGATAC 
TGCGGTCGGA 
CGTCGCTTTA 
ATGCTTGCCT 
TTTTTCCCTG 
GTACGGGATT 
TGGCTGTAA 



CCTCACCCTG 
TGTGCGGCGG 
AACCGCTTNT 
TACGGCAATC 
TCGACCAAAC 
CTGCTGCTCT 
AAAAATCGAG 
TCAACCGGCT 
ATATTATGGG 
CGCGCTGGGA 
TTGCACTGGG 
CAACTGNAAA 
ATCCGTATCA 



TTCCTACTCG 
ATTAAGCAGC 
GGCTGATCCT 
GGCCTGATAC 
CCGCGTCNTG 
TTTTAGGCTT 
AAAATCGGCA 
GTTACCCATA 
GCTGGCTGCC 
AGCGGTAGTG 
TACGCTGCCC 
AAATCATGCA 
TTATGGGCAT 



GTTTCTTCGG 
GCGTTTGCGC 
GCTGCTTAAC 
TCGGATTAAT 
CAGAATATTT 
ATACTTGAGC 
AACCGATATG 
AAATCCATAC 
GTGCGGACTA 
CGGCAACGGG 
AATCTTTNGG 
AAACCGATAT 
TATGGAAACT 



This encodes a protein having amino acid sequence <SEQ ID 396>: 

1 MNXDITFLTL FLLGFFGGTH CIGMCGGLSS AFA LQLPPHI NRXWLILLLN 
51 TGRVSSY TAI GLILGLIGQV GVSL DQTRVX QNILYTAAN L LLLFLGLYLS 
101 GISSLA AKIE KIGKPIWRNL NPILNRLLPI KSI PACLAVG ILWGWLPCGL 
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151 VYSASLYALG SGSAATGGLY MLAFALGTLP NLXAIGIF SL QLXKIMQNRY 
201 IRLCTGLSVS LWALWKLAVL WL* 

ORF103a and ORF103-1 show 97.7% identity in 222 aa overlap: 

10 20 30 40 50 60 

MNXDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRXWLILLLNTGRVSSYTAI 
II I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I II I I I I I I I I I I 1 I I I I I I I I I I I I I I 
MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRVSSYTAI 
10 20 30 40 50 60 

70 80 90 100 110 120 

GLILGLIGQVGVSLDQTRVXQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 
I I II I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I II I I II 
GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 
70 80 90 100 110 120 

130 140 150 160 170 180 

NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 

130 140 150 160 170 180 

190 200 210 220 

NLXAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 

II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 
NLLAIGIFSLQLKKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 

190 200 210 220 

Homology with a predicted ORF from Kgonorrhoeae 

ORF103 shows 95.5% identity over a 222aa overlap with a predicted ORF (ORF103.ng) from N. 
gonorrhoeae: 

orf 103 . pep MNHDITFLTLFLLGXFGGTHCIGMCGGLSSAFXXQLPPHINRFWLILLLNTGRVSSYTAI 60 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I : I I I I I I 
orfl03ng MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRISSYTAI 60 

orf 103 . pep GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 120 

11:11 I I [|:|:M III II Mill Ml: III III II II ill M MM M llll IMIM! 
orfl03ng GLMLGLIGQLGISLDQTRVLQNILYTASNLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 120 

orf 103. pep NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 180 

I I I I I I I I I I I I II I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I 
orfl03ng NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSATTGGLYMLAFALGTLP 180 

orf 103 . pep NLLAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKIiAVLWL 222 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I I 
or f 1 0 3ng NLLAI GI FSLQLKKIMQNRYIRLCTGLS VSLWALWKLAVLWL 222 

The complete length ORF103ng nucleotide sequence <SEQ ID 397> is: 

1 ATGAACCACG ACATCACTTT CCTCACCCTG TTCCTGCTCG GTTTCTTCGG 

51 CGGAACTCAC TGCATCGGTA TGTGCGGCGG ATTAAGCAGC GCGTTTGCGC 

101 TCCAACTCCC CCCGCATATC AACCGCTTTT GGCTGATTCT GCTGCTTAAC 

151 ACAGGACGGA TAAGCAGCTA TACGGCAATC GGCCTGATGC TCGGATTAAT 

201 CGGACAACTC GGCATTTCAC TCGACCAAAc ccgcgTCCTG CAAAATATTT 

251 tatacacagc ctccaaCCTC CTGCTGCTCT TTTTAGGCTT ATACTTGAGC 

301 GGTATTTCTT CCTTGGCGGC AAAAATCGAG AAAATCGGCA AACCGATATG 

351 GCGCAACCTG AACCCGATAC TCAACCGGCT GCTGCCCATA AAATCCATAC 

401 CCGCCTGCCT TGCTGTCGGA ATATTATGGG GCTGGCTGCC GTGCGGACTG 

451 GTTTACAGCG CATCACTTTA CGCGCTGGGA AGCGGTAGTG CGACAACCGG 

501 CGGACTGTAT ATGCTTGCCT TTGCACTGGG TACGCTGCCC AATCTTTTGG 

551 CAATCGGCAT TTTTTCCCTG CAACTGAAAA AAATCATGCA AAACCGATAT 

601 ATCCGCCTGT GTACAGGATT ATCCGTATCA TTATGGGCAT TATGGAAGCT 

651 TGCCGTCCTG TGGCTGTAA 



orf lC3a .pep 
orfl03-l 

orf 103a. pep 
orfl03-l 

orfl03a.pep 
orfl03-l 

orf 103a. pep 
orfl03-l 



This encodes a protein having amino acid sequence <SEQ ID 398>: 
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1 MNHDITFLTL FLLGFFGGTH CIGMCGGLSS AFA LOLPPHI NRFWLILLLN 

51 TGRISSY TAI GLMLGLIGQL GISL DQTRVL QNILYTASN L LLLFLGLYLS 

101 GISSLAA KIE KIGKPIWRNL NPILNRLLPI KSIP ACLAVG ILWGWLPCGL 

151 VYSASLYALG SGSATTGGLY M LAFALGTLP NLLAIGIF SL QLKKIMQNRY 

201 IRLCTGLSVS LWALWKLAVL WL* 

In addition, ORF103ng and ORF103-1 show 97.3% identity in 222 aa overlap: 

10 20 30 40 50 60 

orf 103-1. pep MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRVSSYTAI 
I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I 
orfl03ng MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRISSYTAI 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 103-1 . pep GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 

I I :l I i I I I: hi II I I I II I I I I I I 1:1 I I I I I I I I Ml I I I I I i I M | | Mi I I I I I I 
orfl03ng GLMLGLIGQLGISLDQTRVLQNILYTASNLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 103-1 . pep NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 
M M I M M M M M M M M M M M M M M M M II M I I I M M M M I I M I M I 
orfl03ng NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSATTGGLYMLAFALGTLP 

130 140 150 160 170 180 

190 200 210 220 

orf 103-1. pep NLLAIGIFSLQLKKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 

II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
orfl03ng NLLAIGIFSLQLKKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 

190 200 210 220 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 47 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 399>: 

1 ATGGAAAACC AAAGGCCGCT CCTAGGCTTT CGCTTGGCAC TTTTGGCGGC 

51 GATGACGTGG GGAACGCTGC CGAT.TCCGT GCGGCAGGTA TTGAAGTTTG 

101 TCGATGCGCC GACGCTGGTG TGGGTGCGTT TTACCGTGGC GGCGGCGGTA 

151 TTGTTTGTTT TGCTGGCACT GGGCGGGCGG CTGCcGAAGC GGCGaGGATT 

201 TTTCTTGGTG CTCATTCAGG CTGCTGCTGC TCGGCGTGGC GGGCATTTCG 

251 GCAAACTTTG TGCTGATTGC CCAAGGGCTG CATTATATTT CGCCGACCAC 

301 GACGCAGGTT TTGTGGCAGA TTTCGCCGTT TACGATGATT GTwGTCGGTG 

351 TGTTGGTGTT TAAAGACCGG ATGACTGCCG CTCAGAAAAT CGGCTTGGTT 

401 TTGCTGCTTG CCGGTTTGCT TATGTATTTT AACGATAAAT TCGGCGAGTT 

451 GTCGGGTTTG GGCGCGTATG C.AAGGGCGT GTTGCTGTGT GCGGCAGGCA 

501 GTATGGCATG GGTGTGTAAT GCCGTGGCGC AAAAGCTGCT GTCGGCGCAA 

551 TTCGGGCCGC AACAGATTCT GCTGTTGATT TATGCGGCAA GTGCCGCCGT 

601 GTTCCTGCCG TTTGCCGAAC CGGCACACAT CGGAAGTATG GACGGTACGT 

651 TGGCGTGGGT ATGTATTGCG TATTGCTGCT TGAATACGTT AATCGGTTAC 

701 GGCTCGTTCG GCGAGGCGTT GAAACATTGG GAGGCTTCCA AAGTCAGCGC 

751 GGTAACAACC TTGCTCCCCG TGTTTACCGT AATAAATACT TTGCTCGGGC 

801 ATTATGTGAT GCCTGAAACT TTTGCCGCGC CGGA. . 

This corresponds to the amino acid sequence <SEQ ID 400; ORF104>: 

1 MENQRPLLGF RLALLAAMTW GTLPXSVRQV LKFVDAPTLV WVRFTVAAAV 

51 LFVLLALGGR LPKRRDFSWC SFRLLLLGVA GISANFVLIA QGLHYISPTT 

101 TQVLWQISPF TMIWGVLVF KDRMTAAQKI GLVLLLAGLL MYFNDKFGEL 

151 SGLGAYXKGV LLCAAGSMAW VCNAVAQKLL SAQFGPQQIL LLIYAASAAV 

201 FLPFAEPAHI GSMDGTLAWV CIAYCCLNTL IGYGSFGEAL KHWEASKVSA 
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251 VTTLLPVFTV INTLLGHYVM PETFAAP... 

Further work revealed further partial DNA sequence <SEQ ID 40 1>: 

1 ATGGAAAACC AAAGGCCGCT CCTAGGCTTC GCGTTGGCAC TTTTGGCGGC 

51 GATGACGTGG GGAACGCTGC CGATTGCCGT GCGGCAGGTA TTGAAGTTTG 

101 TCGATGCGCC GACGCTGGTG TGGGTGCGTT TTACCGTGGC GGCGGCGGTA 

151 TTGTTTGTTT TGCTGGCACT GGGCGGGCGG CTGCCGAAGC GGCGGGATTT 

201 TTCTTGGTGC TCATTCAGGC TGCTGCTGCT CGGCGTGGCG GGCATTTCGG 

251 CAAACTTTGT GCTGATTGCC CAAGGGCTGC ATTATATTTC GCCGACCACG 

301 ACGCAGGTTT TGTGGCAGAT TTCGCCGTTT ACGATGATTG TTGTCGGTGT 

351 GTTGGTGTTT AAAGACCGGA TGACTGCCGC TCAGAAAATC GGCTTGGTTT 

401 TGCTGCTTGC CGGTTTGCTT ATGTTTTTTA ACGATAAATT CGGCGAGTTG 

451 TCGGGTTTGG GCGCGTATGC GAAGGGCGTG TTGCTGTGTG CGGCAGGCAG 

501 TATGGCATGG GTGTGTTATG CCGTGGCGCA AAAGCTGCTG TCGGCGCAAT 

551 TCGGGCCGCA ACAGATTCTG CTGTTGATTT ATGCGGCAAG TGCCGCCGTG 

601 TTCCTGCCGT TTGCCGAACC GGCACACATC GGAAGTTTGG ACGGTACGTT 

651 GGCGTGGGTT TGTTTTGCGT ATTGCTGCTT GAATACGTTA ATCGGTTACG 

701 GCTCGTTCGG CGAGGCGTTG AAACATTGGG AGGCTTCCAA AGTCAGCGCG 

751 GTAACAACCT TGCTCCCCGT GTTTACCGTA ATAwTwwCTT TGCTCGGGCA 

801 TTATGTGATG CCTGAAACTT TTGCCGCGCC GGA. . . 

This corresponds to the amino acid sequence <SEQ ID 402; ORF104-1>: 

1 MENQRPLLGF ALALLAAMTW GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 

51 LFVLL ALGGR LPKRRDFSWC SFR LLLLGVA GISANFVLIA QGLHYISPTT 

101 TQ VLWQISPF TMIWGVLV F KDRMT AAQKI GLVLLLAGLL MFF NDKFGEL 

151 SGLGAYAKG V LLCAAGSMAW VCYAVA QKLL SAQFGPQQ IL LLIYAASAAV 

201 FLPFAEP7VHI GSLD GTLAWV CFAYCCLNTL I GYGSFGEAL KHWEASKVSA 

251 VTTLLPVFTV IXXL LGHYVM PETFAAP. . . 

Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical HI0878 protein of H. influenzae (accession number U32769) 
ORF104 and HI0878 show 40% aa identity in 277aa overlap: 



orfl04 


4 


QRPLLGFRLALLAAMTWGTLPXSVRQVLKFVDAPTLVWXXXXXXXXXXXXXXXXXXXXP- 


62 






Q+PLLGF AL+ AM WG+LP +++QVL ++A T+VW P 




HI0878 


3 


QQPLLGFTFALITAMAWGSLPIALKQVLSVMNAQTIVWYRFIIAAVSLLALLAYKKQLPE 


62 


orfl04 


63 


— KRRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 


120 






K R ++W ++L+GV G+++NF+L + L+YI P+ Q+ +S F M++ GVL+F 




HI0878 


63 


LMKVRQYAW IMLIGVIGLTSNFLLFSSSLNYIEPSVAQIFIHLSSFGMLICGVLIF 


118 


orfl04 


121 


KDRMTAAQKI XXXXXXXXXXMY FN DKFGEL S GLGAYXKGVLLCAAGSMAWVCNAVAQKLL 


180 






K+++ QKI ++FND+F +GL Y GV+L G++ WV +AQKL+ 




HI0878 


119 


KEKLGLHQKIGLFLLLIGLGLFFNDRFDAFAGLNQYSTGVILGVGGALIWVAYGMAQKLM 


178 


orfl04 


181 


SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSMDGTLAWVCIAYCCLNTLIGYGSFGEAL 


240 






+F QQILL++Y A F+P A+ + + + LA +C YCCLNTLIGYGS+ EAL 




HI0878 


179 


LRKFNSQQILLMMYLGCAIAFMPMADFSQVQELT-PLALICFIYCCLNTLIGYGSYAEAL 


237 


orfl04 


241 


KHWEASKVSAVTTLLPVFTVINTLLGHYVMPETFAAP 277 








W+ SKVS V TL+P+FT++ + + HY P FAAP 




HI0878 


238 


NRWDVSKVSWITLVPLFTILFSHIAHYFSPADFAAP 274 





Homology with a predicted ORF from Mmeninzitidis (strain A) 

ORF104 shows 95.3% identity over a 277aa overlap with an ORF (ORF104a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 104 . pep MENQRPLLGFRLALLAAMTWGTLPXSVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 
MINIUM MMMMMMI : I I I I I I I I I I I t I I I I I I I I I I I I I I I I I I I I I I 
or f 1 0 4 a MENQRPLLGFALALLAAMTWGTLP IAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 

10 20 30 40 50 60 



70 80 90 100 110 120 
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orf 104 . pep LPKRRDFSWCS FRLLLLGVAGI SANFVLIAQGLHYISPTTTQVLWQI S PFTMIWGVLVF 
III 1 I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I 
orf 104a LPKWRDFSWCSFRLLLLGVAGISANHVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 104 . pep KDRMTAAQKIGLVLLLAGLLMYFNDKFGELSGLGAYXKGVLLCAAGSMAWVCNAVAQKLL 
I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 
orf 104a KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 

130 140 150 160 170 . 180 

190 200 210 220 230 240 

orf 104 .pep SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSMDGTLAWVCIAYCCLNTLIGYGSFGEAL 
I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I : I I I I I I I I : I I I I I I I I I I I I I I I I I I 
orf 104a SAQFGPQQILLLIYAASAAVFLPFAELAHIGSLDGTLAWVCFAYCCLNTLIGYGSFGEAL 

190 200 210 220 230 240 

250 260 270 

orf 104 . pep KHWEASKVSAVTTLLPVFTVINTLLGHYVMPETFAAP 
I I I I I I I I I I I I I I I I i I I I I : I I I I I I I I : I I I I I 
orf 104a KHWEASKVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMNGLGYAGALVWGGAVTAAVG 

250 260 270 280 290 300 

The complete length ORF104a nucleotide sequence <SEQ ID 403> is: 

1 ATGGAAAACC AAAGGCCGCT CCTAGGCTTC GCGTTGGCAC TTTTGGCGGC 

51 GATGACGTGG GGAACGCTGC CGATTGCCGT GCGGCAGGTA TTGAAGTTTG 

101 TCGATGCGCC GACGCTGGTG TGGGTGCGTT TTACCGTGGC GGCGGCGGTA 

151 TTGTTTGTTT TGCTGGCATT GGGCGGGCGG CTGCCGAAGT GGCGGGATTT 

201 TTCTTGGTGC TCATTCAGGC TGCTGCTGCT CGGCGTGGCG GGCATTTCGG 

251 CAAACTTTGT GCTGATTGCC CAAGGGCTGC ATTATATTTC GCCGACCACG 

301 ACGCAGGTTT TGTGGCAGAT TTCGCCGTTT ACGATGATTG TTGTCGGTGT 

351 GTTGGTGTTT AAAGACCGGA TGACTGCCGC TCAGAAAATC GGCTTGGTTT 

401 TGCTGCTTGC CGGTTTGCTT ATGTTTTTTA ACGATAAATT CGGCGAGTTG 

451 TCGGGTTTGG GCGCGTATGC GAAGGGCGTG TTGCTGTGTG CGGCAGGCAG 

501 TATGGCATGG GTGTGTTATG CCGTGGCGCA AAAGCTGCTG TCGGCGCAAT 

551 TCGGGCCGCA ACAGATTCTG CTGTTGATTT ATGCGGCAAG TGCCGCCGTG 

601 TTCCTGCCGT TTGCCGAACT GGCACACATC GGAAGTTTGG ACGGTACGTT 

651 GGCGTGGGTT TGTTTTGCGT ATTGCTGCTT GAATACGTTA ATCGGTTACG 

701 GCTCGTTCGG CGAGGCGTTG AAACATTGGG AGGCTTCCAA AGTCAGCGCG 

751 GTAACAACCT TGCTCCCCGT GTTTACCGTA ATATTTTCTT TGCTCGGGCA 

801 TTATGTGATG CCTGATACTT TTGCCGCGCC GGATATGAAC GGTTTGGGTT 

851 ATGCCGGCGC ACTGGTCGTG GTCGGGGGTG CGGTTACGGC GGCGGTGGGG 

901 GACAGGCTGT TCAAACGCCG CTAG 

This encodes a protein having amino acid sequence <SEQ ID 404>: 

1 MENQRPLLGF ALALLAAMT W GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 

51 LFVLL ALGGR LPKWRDFSWC SFR LLLLGVA GISANFVtIA QGLHYISPTT 

101 T QVLWQISPF TMIWGVLV F KDRMT AAQKI GLVLLLAGLL MFF NDKFGEL 

151 SGLGAYAKG V LLCAAGSMAW VCYAVA QKLL SAQFGPQQ IL LLIYAASAA V 

201 FLPFA ELAHI GSLD GTLAWV CFAYCCLNTL I GYGSFGEAL KHWEASKVSA 

251 VTTLLPVFTV IFSL LGHYVM PDTFAAPDMN GL GYAGALW VGGAVTAAV G 

301 DRLFKRR* 

ORF104a and ORF104-1 show 98.2% identity in 277 aa overlap: 

10 20 30 40 50 60 

orf 104a . pep MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 
I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 104-1 MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 104a. pep LPKWRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 
III I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I 1 I I I I I I I I I I I I I I I II I I I I I I I I I I 
orf 104-1 LPKRRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 

70 80 90 100 110 120 



orf 104a. pep 



130 140 150 160 170 180 

KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 
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I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWCYAVAQKLL 
130 140 150 160 170 180 

190 200 210 220 230 240 

SAQFGPQQILLLIYAASAAVFLPFAELAHIGSLDGTLAWVCFAYCCLNTLIGYGSFGEAL 
I I I I I I I I I I II I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I M I I I I I I I I I I 
SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSLDGTLAWVCFAYCCLNTLIGYGSFGEAL 
190 200 210 220 230 240 

250 260 270 280 290 300 

KHWEASKVSAVTTLLPVFTVI FSLLGHYVMPDT FAAPDMNGLGYAGALVWGGAVTAAVG 
I I I I I I I II I I I II I I I II I I I I M I I I I : I I I I I 
KHWEASKVSAVTTLLPVFTVIXXLLGHYVMPETFAAP 
250 260 270 

Homology with a predicted ORF from N. gonorrhoeae 

ORF104 shows 93.9% identity over a 277aa overlap with a predicted ORF (ORF104.ng) from N. 
gonorrhoeae: 

orf 104 .pep MENQRPLLGFRLALLAAMTWGTLPXSVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 60 

I I I I I I I I I I I I II I I I I I I I I I : I I I I I I I I I I I I II I I II I I I I I I I I I I I I I II I 
orfl04ng MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 60 

orf 104. pep LPKRRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 120 

MINIMI M M I I M I: I I M I M M M II I M 11 II I M M I M M I M M I I M I 
orfl04ng LPKRRDFSWHSFRLLLLGVTGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 120 

orf 104 . pep KDRMTAAQKIGLVLLLAGLLMYFNDKFGELSGLGAYXKGVLLCAAGSMAWVCNAVAQKLL 180 

M I I I M I M M II I I: II II M I II I I M I I I I I I I II II I I I I I I I I I I II MIM 
orfl04ng KDRMTAAQKIGLVLLLVGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 180 

orf 10 4. pep SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSMDGTLAWVCIAYCCLNTLIGYGSFGEAL 240 

I II I I I I I II I I I I I I I I I I I I II I II II I M I II II II : M I I II I II I M II I M I 
orfl04ng SAQFGPQQILLLIYAASAAVFLLXAEPAHIGSLDGTLAWVCFVYCCLNTLIGYGSFGEAL 240 

orf 104 .pep KHWEASKVSAVTTLLPVFTVINTLLGHYVMPETFAAP 277 

I I I II I I I I I I I I I I I II I I I : I I I II I I I : I I I I I 
orfl04ng KHWEASKVSAVTTLLPVFTVI FSLLGHYVMPDTFAAPDMNGLGYVGALVWGGAVTAAVG 300 

The complete length ORF104ng nucleotide sequence <SEQ ID 405> is predicted to encode a 
protein having amino acid sequence <SEQ ID 406>: 

1 MENORPLLGF ALALLAAMT W GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 

51 LFVLLALGGR LPKRRDFSWH SFR LLLLGVT GISANFVLIA QGLHYISPTT 

101 TQ VLWQISPF TMIWGVLV F KDRMTA AQKI GLVLLLVGLL MFF NDKFGEL 

151 SGLGAYAKG V LLCAAGSMAW VCYAVA QKLL SAQFGP QQIL LLIYAASAAV 

201 FLLXA EPAHI GSL DGTLAWV CFVYCCLNTL IGYGSFGEAL KHWEAS KVSA 

251 VTTLLPVFTV IFS LLGHYVM PDTFAAPDMN G LGYVGALW VGGAVTAA VG 

301 DRPFKRR* 

Further work revealed the complete gonococcal nucleotide sequence <SEQ ID 407>: 

1 ATGGAAAACC AAAGGCCGCT CCTAGGCTTC GCGTTGGCAC TTTTGGCGGC 

51 GATGACGTGG GGGACGCTGC CGATTGCCGT GCGGCAGGTA TTGAAGTTTG 

101 TCGATGCGCC GACGCTGGTG TGGGTGCGTT TTACCGTGGC GGCGGCGGTA 

151 TTGTTTGTTT TGCTGGCATT GGGCGGGCGG CTGCCGAAGC GGCGGGATTT 

201 TTCTTGGCAT TCATTCAGGC TGCTGCTGCT CGGCGTGACG GGCATTTCGG 

251 CAAACTTTGT GCTGATTGCC CAAGGGCTGC ATTATATTTC GCCGACCACG 

301 ACGCAGGTTT TGTGGCAGAT TTCGCCGTTT ACGATGATTG TTGTCGGCGT 

351 GTTGGTGTTT AAAGACCGGA tgaCTGCCGC GCAGAAAATC GGTTTGGTTT 

401 TGCTGCttgT CGGTttgCTT ATGTTTTtta ACGACAAATT CGGCGAGTTG 

451 TCGGGTTTGG GCGCGTATGC GAAGGGCGTG TTGCTGTGTG CGGCAGGCAG 

501 TATGGCCTGG GTGTGTTATG CCGTGGCGCA AAAGCTGCTG TCGGCGCAAT 

551 TCGGGCCGCA ACAGATTCTG CTGTTGATTT ATGCGGcaag tgccgccGTG 

601 TTCCtgccgT TTGccgaaCC GGCACACATC GGAAGTTTgg aCGGTACGtt 

651 GGCGTGGGTT TGTTTTGTGT ATTGCTGCTT GAATACGTTA ATCGGTTACG 



orfl04-l 

orf 104a. pep 
orfl04-l 

orf 104a. pep 
orf!04-l 
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701 GCTCGTTCGG CGAGGCGTTG AAACATTGGG AGGCTTCCAA AGTCAGCGCG 

751 GTAACAACCT TGCTCCCCGT GTTTACCGTA ATATTTTCTT TGCTCGGGCA 

801 TTATGTGATG CCTGATACTT TTGCCGCGCC GGATATGAAC GGTTTGGGTT 

851 ATGTCGGCGC ACTGGTCGTG GTCGGGGGTG CGGTTACGGC GGCGGTGGGG 

901 GACAGGCCGT TCAAACGCCG CTAG 

This corresponds to the amino acid sequence <SEQ ID 408; ORF104ng-l>: 



10 



l 

51 
101 
151 
201 
251 
301 



MENQRPLLGF ALALLAAMT W 
LFVLLALGGR LPKRRDFSWH 
T QVLWQISPF TMIWGVLV F 
SGLGAYAKG V LLCAAGSMAW 
FLPFAEPAHI GSLDGTLAWV 



GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 
SFR LLLLGVT GISANFVLIA QGLHYISPTT 
KDRMTA AQKI GLVLLLVGLL MFF NDKFGEL 
VCYAVAQKLL SAQFGPQQIL LLIYAASAAV 



CFVYCCLNTL IGYGSFGEAL KHWEASKVSA 



VTTLLPVFTV IFSL LGHYVM PDTFAAPDMN GL GYVGALW VGGAVTAAV G 
DRPFKRR* 



ORF104ng-l and ORF104-1 show 97.5% identity in 277 aa overlap: 
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20 
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orfl04-l .pep 
orfl04ng-l 

orfl04-l.pep 
orfl04ng-l 



orf 104-1 .pep 
orfl04ng-l 

orf 104-1 .pep 
orfl04ng-l 

orf 104-1 . pep 
orf 104ng-l 



10 20 30 40 50 60 

MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 
I I I I I I I I I I I I I I I I I I II ! I I I I I I I I I I I i I I I I I I I I I I M I I I I I ! I I I I I I I I I 
MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVTtfVRFTVAAAVLFVLLALGGR 

10 20 30 • 40 50 60 

70 80 90 100 110 120 

LPKRRD FS WCS FRLLLLGVAG I SAN FVL I AQGLH Y I S PTTTQVLWQI S PFTMI WGVLVF 
I I I I I I I I I I I i I I I I II : I I I I I I I I I I I I I II I I I I I I II II I I I I I I I I I I I I I I I 
LPKRRDFSWHS FRLLLLGVTGI SANFVLIAQGLHY I S PTTTQVLWQI S PFTMI WGVLVF 

70 80 90 100 110 120 

130 140 150 160 170 180 

KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 
I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
KDRMTAAQKIGLVLLLVGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 

130 140 150 160 170 180 

190 200 210 220 230 240 

SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSLDGTLAWVCFAYCCLNTLIGYGSFGEAL 
Ml I Mil Mill III II I I I I 1 I I I I I I I Mill I IN II 1:11 MM II llllill II 
SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSLDGTLAWVCFVYCCLNTLIGYGSFGEAL 
190 200 210 220 230 240 

250 260 270 

KHWEASKVSAVTTLLPVFTVIXXLLGHYVMPETFAAP 
I I I II II II I I II I II I I I I I I II I II I I : I I I II 

KHWEASKVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMNGLGYVGALVWGGAVTAAVG 
250 260 270 280 290 300 



45 



In addition, ORF104ng-l shows significant homology with a hypothetical H.influenzae protein: 

gi 1 1573895 (U32769) hypothetical [Haemophilus influenzae] Length = 306 
Score = 237 bits (598) , Expect = 8e-62 

Identities = 114/280 (40%), Positives = 168/280 (59%), Gaps « 8/280 (2%) 

QRPXXXXXXXXXXXMTWGTLPIAVRQVLKFVDAPTLVWXXXXXXXXXXXXXXXXXXXXP- 8 8 
Q+P M WG+LPIA++QVL ++A T+VW P 

QQPLLGFTFALITAMAWGSLPIALKQVLSVMNAQTIVWYRFIIAAVSLLALLAYKKQLPE 62 

— KRRDFSWHS FRLLLLGVTGI SANFVLIAQGLHYIS PTTTQVLWQI S PFTMI WGVLVF 14 6 

K R ++W ++L+GV G+++NF+L + L+YI P+ Q+ +S F M++ GVL+F 
LMKVRQYAW IMLIGVIGLTSNFLLFSSSLNYIEPSVAQIFIHLSSFGMLICGVLIF 118 
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+F QQILL++Y 



A F+P A+ + + L 



LA +CF+YCCLNTLIGYGS+ EAL 
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I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 104-1 KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 104a. pep SAQFGPQQILLLIYAASAAVFLPFAELAHIGSLDGTLAWVCFAYCCLNTLIGYGSFGEAL 
I I I I I I I I I I II I I I I I I I I I I I 1 I I I I I I I I I I I II I I II I I I I I I II I I I I I I I I I I 
orf 104-1 SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSLDGTLAWVCFAYCCLNTLIGYGSFGEAL 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 104a . pep KHWEASKVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMNGLGYAGALVWGGAVTAAVG 

I I I I I I I t I I 1 I I I I I M 11 I 11111111:11111 
orf 104-1 KHWEASKVSAVTTLLPVFTVIXXLLGHYVMPETFAAP 

250 260 270 



Homology with a predicted ORF from A gonorrhoeae 

ORF104 shows 93.9% identity over a 277aa overlap with a predicted ORF (ORF104.ng) from N. 
gonorrhoeae: 



orf 104 .pep 
orfl04ng 
orf 104 .pep 
orf 104ng 



MENQR P LLG FRLALLAAMT WGT L PX S VRQVLK FV DAPT LVWVRFT VAAAVL FVLLALGGR 
I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I II I I I II II I I I I I I II I I II I I I I I I 
MENQRPLLGFA1ALIJVAMTWGTLPIAVRQVLKFVDAPTLVWVRFT VAAAVL FVLLALGGR 



orf 104 .pep 
orf 104ng 
orf 104 .pep 
orf 104ng 



60 



60 



120 



LPKRRDFSWCS FRLLLLGVAGI SANFVLI AQGLHYI SPTTTQVLWQI S PFTMI WGVLVF 
MINIMI 11 I M I I I I: I II M II I II II II I II I II I I II II I II I I II I I I II II 
LPKRRDFSWHSFRLLLLGVTGISANFVLIAQGLHYISPTTTQVLWQISPFTMI WGVLVF 120 



orf 104. pep KDRMTAAQKIGLVLLLAGLLMYFNDKFGELSGLGAYXKGVLLCAAGSMAWVCNAVAQKLL 180 

II II I II II M I II I I : I II I : II II I I M II M M I I I II I II II II I II I II II I I 
o r f 1 0 4 ng KDRMTAAQKIGLVLLLVGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 180 



SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSMDGTLAWVCIAYCCLNTLIGYGSFGEAL 240 
II II I II I II II II I II M I I I II I I I I I I : M II II I I : : I II I I II II II I I I I II 

SAQFGPQQILLLIYAASAAVFLLXAEPAHIGSLDGTLAWVCFVYCCLNTLIGYGSFGEAL 240 

KHWEASKVSAVTTLLPVFTVINTLLGHYVMPETFAAP 277 
I II I II II II II II II II II 1 : I I II I II I : II I II 

KHWEASKVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMNGLGYVGALVWGGAVTAAVG 300 



The complete length ORF104ng nucleotide sequence <SEQ ED 405> is predicted to encode a 



protein having amino acid sequence <SEQ ID 406>: 



1 MENORPLLGF ALALLAAMT W GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 

51 LFVLL ALGGR LPKRRDFSWH SFR LLLLGVT GISANFVLIA QGLHYISPTT 

101 T QVLWQISPF TM I WGVLV F KDRMTA AQKI GLVLLLVGLL MFF NDKFGEL 

151 SGLGAYAKG V LLCAAGSMAW VCYAVA QKLL SAQFGP QQIL LLIYAASAAV 

201 FLLXA EPAHI GSL DGTLAWV CFVYCCLNTL IGYGSFGEAL KHWEASKVSA 

251 VTTLLPVFTV IFS LLGHYVM PDTFAAPDMN G LGYVGALW VGGAVTAA VG 

301 DRPFKRR* s 

Further work revealed the complete gonococcal nucleotide sequence <SEQ ID 407>: 

1 ATGGAAAACC AAAGGCCGCT CCTAGGCTTC GCGTTGGCAC TTTTGGCGGC 

51 GATGACGTGG GGGACGCTGC CGATTGCCGT GCGGCAGGTA TTGAAGTTTG 

101 TCGATGCGCC GACGCTGGTG TGGGTGCGTT TTACCGTGGC GGCGGCGGTA 

151 TTGTTTGTTT TGCTGGCATT GGGCGGGCGG CTGCCGAAGC GGCGGGATTT 

201 TTCTTGGCAT TCATTCAGGC TGCTGCTGCT CGGCGTGACG GGCATTTCGG 

251 CAAACTTTGT GCTGATTGCC CAAGGGCTGC ATTATATTTC GCCGACCACG 

301 ACGCAGGTTT TGTGGCAGAT TTCGCCGTTT ACGATGATTG TTGTCGGCGT 

351 GTTGGTGTTT AAAGACCGGA tgaCTGCCGC GCAGAAAATC GGTTTGGTTT 

401 TGCTGCttgT CGGTttgCTT ATGTTTTtta ACGACAAATT CGGCGAGTTG 

451 TCGGGTTTGG GCGCGTATGC GAAGGGCGTG TTGCTGTGTG CGGCAGGCAG 

501 TATGGCCTGG GTGTGTTATG CCGTGGCGCA AAAGCTGCTG TCGGCGCAAT 

551 TCGGGCCGCA ACAGATTCTG CTGTTGATTT ATGCGGcaag tgccgccGTG 

601 TTCCtgccgT TTGccgaaCC GGCACACATC GGAAGTTTgg aCGGTACGtt 

651 GGCGTGGGTT TGTTTTGTGT ATTGCTGCTT GAATACGTTA ATCGGTTACG 
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701 GCTCGTTCGG CGAGGCGTTG AAACATTGGG AGGCTTCCAA AGTCAGCGCG 

751 GTAACAACCT TGCTCCCCGT GTTTACCGTA ATATTTTCTT TGCTCGGGCA 

801 TTATGTGATG CCTGATACTT TTGCCGCGCC GGATATGAAC GGTTTGGGTT 

851 ATGTCGGCGC ACTGGTCGTG GTCGGGGGTG CGGTTACGGC GGCGGTGGGG 

901 GACAGGCCGT TCAAACGCCG CTAG 

This corresponds to the amino acid sequence <SEQ ID 408; ORF104ng-l>: 



10 



i 

51 
101 
151 
201 
251 
301 



MENQRPLLGF ALALLAAMT W 
LFVLLA LGGR LPKRRDFSWH 
T QVLWQISPF TMIWGVLV F 
SGLGAYAKG V LLCAAGSMAW 
FLPFAEPAHI GSLDGTLAWV 



GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 
SF RLLLLGVT GISANFVLIA QGLHYISPTT 
KDRMTA AQKI GLVLLLVGLL MFFN DKFGEL 
VCYAVAQKLL SAQFGPQQIL LLIYAASAAV 



CFVYCCLNTL IGYGSFGEAL KHWEASKVSA 



VTTLLPVFTV IFSL LGHYVM PDTFAAPDMN GL GYVGALW VGGAVTAAV G 
DRPFKRR* 



ORF104ng-l and ORF104-1 show 97.5% identity in 277 aa overlap: 
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20 



25 



10 20 30 40 50 60 

orf 104-1 . pep MENQRPLLGFAIALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 
I I I M I MM II I im MM I MM M MM Ml II II I I I I! II I I III Ml II II II 
orfl04ng-l MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 104-1. pep LPKRRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 
I M M M I I M M II I M : I M M M M I M I I M M I I I II M M M M M M II M I 
orfl04ng-l LPKRRDFSWHSFRLLLLGVTGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 

70 80 90 100 110 120 



30 



130 140 150 160 170 180 

orf 104-1 . pep KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 
I I I I I I I I I I I I I I I I : M I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
or f 1 0 4 ng- 1 KDRMTAAQKI GLVLLLVGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 

130 140 150 160 170 180 



35 



190 200 210 220 230 240 

orf 104-1 . pep SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSLDGTLAWVCFAYCCLNTLIGYGSFGEAL 
I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I : I I I I I I I I I I I I I II I I 
orfl04ng-l SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSLDGTLAWVCFVYCCLNTLIGYGSFGEAL 

190 200 210 220 230 240 



250 260 270 

40 orf 104-1 . pep KHWEASKVSAVTTLLPVFTVIXXLLGHYVMPETFAAP 

I I I I I I I I II I I I I I I I I I I I I I II I I I I : I I I I I 
orfl04ng-l KHWEASKVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMNGLGYVGALVWGGAVTAAVG 

250 260 270 280 290 300 

In addition, ORF104ng-l shows significant homology with a hypothetical HJnfluenzae protein: 

45 gi | 1573895 (U32769) hypothetical [Haemophilus influenzae] Length - 306 

Score «= 237 bits (598), Expect » 8e-62 

Identities = 114/280 (40%), Positives = 168/280 (59%), Gaps = 8/280 (2%) 
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QRPXXXXXXXXXXXMTWGTLPIAVRQVLKFVDAPTLVWXXXXXXXXXXXXXXXXXXXXP- 8 8 
Q+P M WG+LPIA++QVL ++A T+VW P 

QQPLLGFTFALITA^WGSLPIALKQVLSVMNAQTIVWYRFIIAAVSLLALLAYKKQLPE 62 

— KRRDFSWHSFRLLLLGVTGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 14 6 

K R ++W ++L+GV G+++NF+L + L+YI P+ Q+ +S F M++ GVL+F 
LMKVRQYAW IMLIGVIGLTSNFLLFSSSLNYIEPSVAQIFIHLSSFGMLICGVLIF 118 



K+++ 



QKI 



+FFND+F +GL Y+ GV+L G++ WV Y +AQKL+ 



+F QQILL++Y A F+P A+ + + L LA +CF+YCCLNTLIGYGS+ EAL 



65 Query: 267 KH WE AS KV S AVTT LL P VFT V I FS LLGH YVM P DT FAAP DMN 306 
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W+ SKVS V TL+P+FT++FS + HY P FAAP++N 
Sbjct: 238 NRWDVSKVSVVITLVPLFTILFSHIAHYFSPADFAAPELN 277 

Based on this analysis, including the presence of a putative leader sequence and several putative 
transmembrane domains in the gonococcal protein, it is predicted that the proteins from 
AT. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 

Example 48 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 409>: 

1 ATGGTAGCTC GTCGGGCTCA TAACCCGAAG GTCGTAGGTT CGAATCCTGT 

51 .CCCGCAACC TAATTTCAAA CCCCTCGGTT CAATGCCGAG GG , GTTTTGT 

101 T.TTGCCTGT TTCCTGTTTC CTGTTTCCTG CCGCCTCCGT TTTTTGCCGG 

151 ATTTTCCTTC CGGCCGCAAT ATCGGAACGG CAGACCGCCG TCTGTTTGCG 

201 GTTGCAAATT CAGGCAGTTT GGCTACAATC TTCCGCATTG TCTTCAAGAA 

251 AGCCAACCAT GCCGACCGTC CGTTTTACCG AATCCGTCAG CAAACAAGAC 

301 CTTGATGCTC TGTTCGAGTG GGCAAAAGCA AGTTACGGTG CAGAAAGTTG 

351 CTGGAAAACG CTGTATCTGA ACGGTCysCC TTTGGGCAAC CTGTCGCCGG 

401 AATGGGTGGA ACGCGTsmmA AAAGACTGGG AGGCAGGCTG CyCGGAGTCT 

451 TCAGACGGCA TTTTTCTGAA TgCGGACGGc TGgCctGATA TGGgCGGAcg 

501 cTTACAGCAC CTCGCCCTCG GTTGGCACTG TGCGGGGCTG TTGGACGgsT 

551 GGCGCAACGA GTGTTTCGAC CTGACCGACG GCGGCGGCAA CCCCTTGTTC 

601 ACGCTCGaAc GCGCCGyTTT mCGTCCTkTC GGACTGCTCA GCCGCGCCGT 

651 CCATCTCAAC GGTCTGACCG AATCGGACGG CCGATGGCAT TTCTGGATAG 

701 GCAGGCGCAG TCCGCACAAA GCAGTCGATC CCAACAAACT CGACAATACT 

751 rCCGCCGGCG GTGTTTCCGG CGGCGAAATG CCGTCTGAAG CCGTGTGTCG 

801 CGAAAGCAGC GAAGAAGCCG GTTTGGATAA AACGCTGcTT CCGCTCATCC 

851 GCCCGGTATC GCAGCTGCAC AGCCTGCGCT CCGTCAGCCG GGGTGTACAC 

901 AATGAAATCC TGTATGTATT CGATGCCGTC CTGCCG... 

This corresponds to the amino acid sequence <SEQ ID 410; ORF105>: 



1 MVARRAHNPK VVGSNPXPAT XFQTPRFNAE XVLXLPVSCF LFPAASVFCR 

51 IFLPAAISER QTAVCLRLQI QAVWLQSSAL SSRKPTMPTV RFTESVSKQD 

101 LDALFEWAKA SYGAESCWKT LYLNGXPLGN LSPEWVERVX KDWEAGCXES 

151 SDGIFLNADG WPDMGGRLQH LALGWHCAGL LDGWRNECFD LTDGGGNPLF 

201 TLERAXXRPX GLLSRAVHLN GLTESDGRWH FWIGRRSPHK AVDPNKLDNT 

251 XAGGVSGGEM PSEAVCRESS EEAGLDKTLL PLIRPVSQLH SLRSVSRGVH 

301 NEILYVFDAV LP. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 41 1>: 



1 ATGCCGACCG TCCGTTTTAC CGAATCCGTC AGCAAACAAG ACCTTGATGC 

51 TCTGTTCGAG TGGGCAAAAG CAAGTTACGG TGCAGAAAGT TGCTGGAAAA 

101 CGCTGTATCT GAACGGTCTG CCTTTGGGCA ACCTGTCGCC GGAATGGGTG 

151 GAACGCGTCA AAAAAGACTG GGAGGCAGGC TGCTCGGAGT CTTCAGACGG 

201 CATTTTTCTG AATGCGGACG GCTGGCCTGA TATGGGCGGA CGCTTACAGC 

251 ACCTCGCCCT CGGTTGGCAC TGTGCGGGGC TGTTGGACGG CTGGCGCAAC 

301 GAGTGTTTCG ACCTGACCGA CGGCGGCGGC AACCCCTTGT TCACGCTCGA 

351 ACGCGCCGCT TTCCGTCCTT TCGGACTGCT CAGCCGCGCC GTCCATCTCA 

401 ACGGTCTGAC CGAATCGGAC GGCCGATGGC ATTTCTGGAT AGGCAGGCGC 

4 51 AGTCCGCACA AAGCAGTCGA TCCCAACAAA CTCGACAATA CTGCCGCCGG 

501 CGGTGTTTCC GGCGGCGAAA TGCCGTCTGA AGCCGTGTGT CGCGAAAGCA 

551 GCGAAGAAGC CGGTTTGGAT AAAACGCTGC TTCCGCTCAT CCGCCCGGTA 

601 TCGCAGCTGC ACAGCCTGCG CTCCGTCAGC CGGGGTGTAC ACAATGAAAT 

651 CCTGTATGTA TTCGATGCCG TCCTGCCCGA AACCTTCCTG CCTGAAAATC 

701 AGGATGGCGA AGTGGCGGGT TTTGAGAAAA TGGACATCGG CGGTCTGTTG 

751 GATGCCATGT TGTCGGGAAA CATGATGCAC GACGCGCAAC TGGTTACGCT 

801 GGACGCGTTT TGCCGTTACG GTCTGATTGA TGCCGCCCAT CCGCTGTCCG 

851 AGTGGCTGGA CGGCATACGT T TAT AG 

This corresponds to the amino acid sequence <SEQ ID 412; ORF105-1>: 
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1 MPTVRFTESV SKQDLDALFE WAKASYGAES CWKTLYLNGL PLGNLSPEWV 

51 ERVKKDWEAG CSESSDGIFL NADGWPDMGG RLQHLALGWH CAGLLDGWRN 

101 ECFDLTDGGG NPLFTLERAA FRPFGLLSRA VHLNGLTESD GRWHFWIGRR 

151 SPHKAVDPNK LDNTAAGGVS GGEMPSEAVC RESSEEAGLD KTLLPLIRPV 

201 SQLHSLRSVS RGVHNEILYV FDAVLPETFL PENQDGEVAG FEKMDIGGLL 

251 DAMLSGNMMH DAQLVTLDAF CRYGLIDAAH PLSEWLDGIR L* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF105 shows 89.4% identity over a 226aa overlap with an ORF (ORF105a) from strain A of N. 
meningitidis: 



60 70 80 90 100 110 

orf 105 . pep ISERQTAVCLRLQIQAVWLQSSALSSRKPTMPTVRFTESVSKQDLDALFEWAKASYGAES 

I i I I I I I I I I I I : I I I I I I 1 I I I I I I I I I I 
orf 105a MPTVRFTESVSKHDLDALFEWAKASYGAES 

10 20 30 

120 130 140 150 160 170 

orf 105 . pep CWKT LYLNGX PLGNL S PEWVERVXKDWE AGCXE S S DG I FLN ADGW P DMGGRLQHLALGWH 
I I I I I I I I I I I i I I j I I I : I I 1 I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I : 
orf 105a CWKTLYLNGLPLGNLSPEWAERVKKDWEAGCSESSDGIFLNADGWPDMGRRLQHLARIWK 

40 50 60 70 80 90 

180 190 200 210 220 230 

orf 105 . pep CAGLLDGWRNECFDLTDGGGNPLFTLERAXXRPXGLLSRAVHLNGLTESDGRWHFWIGRR 

Mil I I I : I I I I I I I I I : I I II : I I I I I i I I I I I I I I I I I I : I I I I I! I I I I I I I 
orf 105a EAGLLHGWRDECFDLTDGGSNPLFALERAAFRPFGLLSRAVHLNGLVESDGRWHFWIGRR 

100 110 120 130 140 150 

240 250 260 270 280 290 

orf 105. pep SPHKAVDPNKLDNTXAGGVSGGEMPSEAVCRESSEEAGLDKTLLPLIRPVSQLHSLRSVS 
11111111:11111 I I I I I : I I : I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
orf 105a SPHKAVDPDKLDNTAAGGVSSGELPSETVCRESSEEAGLDKTLLPLIRPVSQLHSLRPVS 

160 170 180 190 200 210 



300 310 
orf 105 . pep RGVHNE I LYVFDAVLP 
I I I I I I I I I I I I I I I I 

orf 105a RGVHNE I LYVFDAVLPETFLPENQDGEVAGFEKMDIGGLLAAMLSGNMMHDAQLVTLDAF 

220 230 240 250 260 270 

The complete length ORF105a nucleotide sequence <SEQ ID 413> is: 

1 ATGCCGACCG TCCGTTTTAC CGAATCCGTC AGCAAACACG ACCTTGATGC 

51 CCTATTCGAG TGGGCAAAGG CAAGTTACGG TGCGGAAAGT TGCTGGAAAA 

101 CGCTGTATCT GAACGGTCTG CCTTTGGGCA ATCTGTCGCC GGAATGGGCG 

151 GAGCGCGTCA AAAAAGACTG GGAGGCAGGC TGCTCGGAGT CTTCAGACGG 

201 CATTTTCCTG AATGCGGACG GCTGGCCAGA TATGGGCAGA CGCTTGCAGC 

251 ACCTCGCCCG AATATGGAAA GAAGCGGGAC TGCTTCACGG CTGGCGCGAC 

301 GAGTGTTTCG ACCTGACCGA CGGCGGCAGC AATCCCTTGT TCGCGCTCGA 

351 ACGCGCCGCT TTCCGTCCGT TCGGACTGCT CAGCCGCGCC GTCCATCTCA 

401 ACGGTTTGGT CGAATCGGAC GGCCGATGGC ATTTCTGGAT AGGCAGGCGC 

451 AGTCCGCACA AAGCAGTCGA TCCCGACAAA CTCGACAATA CTGCCGCCGG 

501 CGGTGTTTCC AGCGGTGAAT TGCCGTCTGA AACCGTGTGT CGCGAAAGCA 

551 GCGAAGAAGC CGGTTTGGAT AAAACGCTGC TTCCGCTCAT CCGCCCGGTA 

601 TCGCAGCTGC ACAGCCTGCG CCCCGTCAGC CGGGGTGTGC ACAATGAAAT 

651 CCTGTATGTA TTCGATGCCG TCCTGCCCGA AACCTTCCTG CCTGAAAATC 

701 AGGATGGCGA AGTGGCGGGT TTTGAGAAAA TGGACATCGG CGGTCTGTTG 

751 GCTGCCATGT TGTCGGGAAA CATGATGCAC GACGCGCAAC TGGTTACGCT 

801 GGACGCGTTT TGCCGTTACG GTCTGATTGA TGCCGCCCAT CCGCTGTCCG 

851 AGTGGCTGGA CGGCATACGT TTATAG 

This encodes a protein having amino acid sequence <SEQ ID 414>: 



1 MPTVRFTESV SKHDLDALFE WAKASYGAES CWKTLYLNGL PLGNLSPEWA 
51 ERVKKDWEAG CSESSDGIFL NADGWPDMGR RLQHLARIWK EAGLLHGWRD 
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101 ECFDLTDGGS NPLFALERAA FRPFGLLSRA VHLNGLVESD GRWHFWIGRR 

151 SPHKAVDPDK LDNTAAGGVS SGELPSETVC RESSEEAGLD KTLLPLIRPV 

201 SQLHSLRPVS RGVHNEILYV FDAVLPETFL PENQDGEVAG FEKMDIGGLL 

251 AAMLSGNMMH DAQLVTLDAF CRYGLIDAAH PLSEWLDGIR L* 

ORF105a and ORF105-1 show 93.8% identity in 291 aa overlap: 

10 20 30 40 50 60 

MPTVRFTESVSKHDLDALFEWAKASYGAESCWKTLYLNGLPLGNLSPEWAERVKKDWEAG 
! I I I illltl 11:11 M I Mill Mill II lllilil I I I MM Ml 11:11 llllllil 
MPTVRFTESVSKQDLDALFEWAKASYGAESCWKTLYLNGLPLGNLSPEWVERVKKDWEAG 
10 20 30 40 50 60 

70 80 90 100 110 120 

CSESSDGIFLNADGWPDMGRRLQHLARIWKEAGLLHGWRDECFDLTDGGSNPLFALERAA 
II I M M M I I M M M I I M M I I I : MM I M : M M M I M : M II : II I M 
CSESSDGIFLNADGWPDMGGRLQHLALGWHCAGLLDGWRNECFDLTDGGGNPLFTLERAA 
70 80 90 100 110 120 

130 140 150 160 170 180 

FRPFGLLSRAVHLNGLVESDGRWHFWIGRRSPHKAVDPDKLDNTAAGGVSSGELPSETVC 
I I I M M I I I I I M MMI Ml MM IMIMI 1 II I MM Ml II I I M:M:MI:M 
FRPFGLLSRAVHLNGLTESDGRWHFWIGRRSPHKAVDPNKLDNTAAGGVSGGEMPSEAVC 
130 140 150 160 170 180 

190 200 210 220 230 240 

RESSEEAGLDKTLLPLIRPVSQLHSLRPVSRGVHNEILYVFDAVLPETFLPENQDGEVAG 
I M I I I I I I II I I I II I I I I II I I I I I I I I II M II I I II I I I I I II I I I I II I I I I I I 
RESSEEAGLDKTLLPLIRPVSQLHSLRSVSRGVHNEILYVFDAVLPETFLPENQDGEVAG 
190 200 210 220 230 240 

250 260 270 280 290 

FEKMDIGGLLAAMLSGNMMHDAQLVTLDAFCRYGLIDAAHPLSEWLDGIRLX 
I I I I I i I I I I I I I I I I II M II II I I I M I I I I M II II I I I I I I M I II I 
FEKMDIGGLLDAMLSGNMMHDAQLVTLDAFCRYGLIDAAHPLSEWLDGIRLX 
250 260 270 280 290 

Homology with a predicted ORF from N. gonorrhoeae 

ORF105 shows 87.5% identity over a 312aa overlap with a predicted ORF (ORF105.ng) from N. 
gonorrhoeae: 

orf 105 . pep MVARRAHNPKWGSNPXPATXFQTPRFNAEXVLXLPVSCFLFPAASVFCRIFLPAAISER 60 

I I I I I I I I I I II I I I I III : I I I I II II II I I I I I I I II I I I I I I I I I I I I 

orfl05ng MVARRAHNPKWGSNPAPATKYQTPRFNAEGVLF FLFPAASVFCRI FLPAAI SER 55 

orf 105 . pep QTAVCLRLQIQAVWLQSSALSSRKPTMPTVRFTESVSKQDLDALFEWAKASYGAESCWKT 120 

I : I I I I I I II I I I I II I II I II I I : I I I I I I I II I I I I I I I I I I I I I I I I I II II I II 
orflOSng QAAVCLRLQIQAVWLQSSALCSRKPAMPTVRFTESVSKQDLDALFERAKASYGAESCWKT 115 

orf 105 . pep LYLNGXPLGNLSPEWVERVXKDWEAGCXESSDGIFLNADGWPDMGGRLQHLALGWHCAGL 180 

MM IMIMMMM: IMIMI I II : I I I I I I I M I II I II I I I II I: III 
orfl05ng LYLNRLPLGNLSPEWAERIKKDWEAGCSESSNGIFLNADGWPDMGGRLQHLARTWNKAGL 175 

orf 105. pep LDGWRNECFDLTDGGGNPLFTLERAXXRPXGLLSRAVHLNGLTESDGRWHFWIGRRSPHK 240 

I I I I I I II II I II I I I II I M I I I II Ml I I II II II : II : I I II I I I I I M I M 
orfl05ng LHGWRNECFDLTDGGGNPLFTLERAAFRPFGLLIRAVHLNGLVESNGRWHFWIGRRSPHK 235 

orf 105. pep AVDPNKLDNTXAGGVSGGEMPSEAVCRESSEEAGLDKTLLPLIRPVSQLHSLRSVSRGVH 300 

I I I I : II I I : I I I I II II II I I II II I M II I I I I I I : I I I II II : II I I I I II II I 
orflOSng AVDPGKLDNIAGGGVSGGEMPSEAVCRESSEEAGLDKTLFPLIRPVSRLHSLRPVSRGVH 295 

orf 105. pep NEILYVFDAVLP 312 

II I I I I I I II I I 

orfl05ng NEILYVFDAVLPETFLPENQDGEVAGFEKMDIGGLLDAMLSKNMMHDAQLVTLDAFYRYG 355 

A complete length ORF105ng nucleotide sequence <SEQ ID 41 5> was predicted to encode a 



orflOSa.pep 
orfl05-l 

orf 105a. pep 
orfl05-l 

orf 105a. pep 
orfl05-l 

orflOSa.pep 
orfl05-l 

orf 105a. pep 
orfl05-l 



protein having amino acid sequence <SEQ ID 416>: 
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1 MVARRAHNPK WGSNPAPAT KYQTPRFNAE G VLFFLFPAA SVFCRIFL FA 

51 AISERQAAVC LRLQIQAVWL QSSALCSRKP AMPTVRFTES VSKQDLDALF 

101 ERAKASYGAE SCWKTLYLNR LPLGNLSPEW AERIKKDWEA GCSESSNGIF 

151 LNADGWPDMG GRLQHLARTW NKAGLLHGWR NECFDLTDGG GNPLFTLERA 

201 AFRPFGLLIR AVHLNGLVES NGRWHFWIGR RSPHKAVDPG KLDNIAGGGV 

251 SGGEMPSEAV CRESSEEAGL DKTLFPLIRP VSRLHSLRPV SRGVHNEILY 

301 VFDAVLPETF LPENQDGEVA GFEKMDIGGL LDAMLSKNMM HDAQLVTLDA 

351 FYRYGLIDAA HPLSEWLDGI RL* 

Further work revealed the complete nucleotide sequence <SEQ ID 417>: 



1 ATGCCGACCG TCCGTTTTAC CGAATCCGTC AGCAAACAAG ACCTTGATGC 

51 CCTGTTCGAG CGGGCAAAAG CAAGTTACGG TGCCGAAAGT TGCTGGAAAA 

101 CGCTGTATCT GAACCGTCTT CCTTTGGGCA ATCTGTCGCC GGAATGGGCT 

151 GAGCGCATCA AAAAAGACTG GGAGGCAGGC TGCTCCGAGT CTTCAGACGG 

201 CATTTTTCTG AATGCGGACG GCTGGCCGGA TATGGGCGGA CGCTTGCAGC 

251 ACCTCGCCCG CACATGGAAC AAGGCGGGGC TGCTTCACGG ATGGCGCAAC 

301 GAGTGTTTCG ACCTGACCGA CGGCGGCGGC AACCCCTTGT TCACGCTCGA 

351 ACGCGCCGCT TTCCGTCCGT TCGGACTACT CAGCCGCGCC GTCCATCTCA 

4 01 ACGGTTTGGT CGAATCGAAC GGCAGATGGC ATTTTTGGAT AGGCAGGCGC 

451 AGTCCGCACA AAGCAGTCGa tcCCGGCAAG CTCGACAATA TTGCCGGCGG 

501 CGGTGTTTCC GGCGGCGAAA TGCCGTCTGA AGCCGTGTGC CGCGAAAGCA 

551 GCGAAGAAGC CGGTTTGGAT AAAACGCTGT TTCCGCTCAT CCGCCCAGTA 

601 TCGCGGCTGC ACAGCCTTCG CCCCGTCAGC CGAGGTGTGC ACAATGAAAT 

651 CCTGTATGTG TTCGATGCCG TCCTGCCCGA AACCTTCCTG CCTGAAAATC 

701 AGGATGGCGA GGTAGCGGGT TTTGAAAAGA TGGACATTGG CGGCCTATTG 

751 GATGCCATGT TGTCGAAAAA CATGATGCAC GACGCGCAAC TGGTTACGCT 

801 GGACGCGTTT TACCGTTACG GTCTGATTGA TGCCGCCCAT CCGCTGTCCG 

851 AGTGGCTGGA CGGCATACGT TTATAG 

This corresponds to the amino acid sequence <SEQ ID 418; ORF105ng-l>: 



1 MPTVRFTESV SKQDLDALFE RAKASYGAES CWKTLYLNRL PLGNLSPEWA 

51 ERIKKDWEAG CSESSDGIFL NADGWPDMGG RLQHLARTWN KAGLLHGWRN 

101 ECFDLTDGGG NPLFTLERAA FRPFGLLSRA VHLNGLVESN GRWHFWIGRR 

151 SPHKAVDPGK LDNIAGGGVS GGEMPSEAVC RESSEEAGLD KTLFPLIRPV 

201 SRLHSLRPVS RGVHNEILYV FDAVLPETFL PENQDGEVAG FEKMDIGGLL 

251 DAMLSKNMMH DAQLVTLDAF YRYGLIDAAH PLSEWLDGIR L* 

ORG105ng-l and ORF105-1 show 93.5% identity in 291 aa overlap: 



10 20 30 40 50 60 

orf 105-1. pep MPTVRFTESVSKQDLDALFEWAKASYGAESCWKTLYLNGLPLGNLSPEWVERVKKDWEAG 
I I I I I I I I i 1 I 1 I I I I I I I I I I I I I II I I I I I I I I t I I I I I I I I I I I : I I : I I I II I I 
orfl05ng-l MPTVRFTESVSKQDLDALFERAKASYGAESCWKTLYLNRLPLGNLSPEWAERIKKDWEAG 

10 20 30 . 40 50 60 



70 80 90 100 110 120 

orf 105-1 . pep CSE S S DG I FLNADGWPDMGGRLQHLALGWHCAGLLDGWRNECFDLT DGGGN PLFTLERAA 
I I I I I I I I 11 I I I I I I I I I I M I I I I I: I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I 
Orfl05ng-1 CSESSDGIFLNADGWPDMGGRLQHLARTWNKAGLLHGWRNECFDLTDGGGNPLFTLERAA 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 105-1. pep FRPFGLLSRAVHLNGLTESDGRWHFWIGRRS PHKAVDPNKLDNTAAGGVSGGEMPSEAVC 
I I I I I I I I I I I I I I I I : II : I I I I I I I I I I I I I I I I I I : I I I I I : I I II I I ! I I I I I I I 
orfl05ng-l FRPFGLLSRAVHLNGLVESNGRWHFWIGRRSPHKAVDPGKLDNIAGGGVSGGEMPSEAVC 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 105-1. pep RESSEEAGLDKTLLPLIRPVSQLHSLRSVSRGVHNEILYVFDAVLPETFLPENQDGEVAG 
I I I I I I I I I I I II : I I I I I I I : 1 I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 
orfl05ng-l RE S S EEAGLDKT LFPL I RPVSRLHSLRPVS RGVHNEILYV FDAVLPETFL PENQDGEVAG 

190 200 210 220 230 240 



250 260 270 280 290 

orf 105-1. pep FEKMDIGGLLDAMLSGNMMHDAQLVTLDAFCRYGLIDAAHPLSEWLDGIRLX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I i I I I I I I I I 
orfl05ng-l FEKMDIGGLLDAMLSKNMMHDAQLVTLDAFYRYGLIDAAHPLSEWLDGIRLX 

250 260 270 280 290 
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Furthermore, ORF105ng-l shows homology with a yeast enzyme: 

sp|P41888|TNR3_SCHPO THIAMIN PYROPHOSPHOKINASE (TPK) (THIAMIN KINASE) 
>gi|1076928 |pir||S52350 thiamin pyrophosphokinase (EC 2.7.6.2) - fission yeast 
(Schizosaccharomyces pombe) >gi 1 666111 (X84417) thiamin pyrophosphokinase 
[Schizosaccharomyces pombe] >gi|2330852|gnl|PID|e334056 (Z98533) thiamin - 
pyrophosphokinase [Schizosaccharomyces pombe] Length =569 
Score - 105 bits (259) , Expect - 4e-22 

Identities = 64/192 (33%), Positives = 94/192 (48%), Gaps = 3/192 (1%) 

Query: 2 68 NKAGLLHGWRNECFDLTDGGGNPLFTLERAAFRPFGLLSRAVHLNGLVESNGRW — HFWI 441 

N G+ WRNE + + P+ +ER F FG LS VH + + W+ 

Sbjct: 96 NTFGIADQWRNELYTVYGKSKKPVLAVERGGFWLFGFLSTGVHCTMYIPATKEHPLRIWV 155 

Query: 442 GRRSPHKAVDPGKLDNIAGGGVSGGEMPSEAVCRESSEEAGLDKTLFPLIRPVSRLHSLR 621 

RRSP K P LDN GG++ G+ + +E SEEA LD + LI P + ++ 

Sbjct: 156 PRRSPTKQTWPNYLDNSVAGGIAHGDSVIGTMIKEFSEEANLDVSSMNLI-PCGTVSYIK 214 

Query: 622 PVSRG-VHNEILYVFDAVLPETFLPENQDGEVAGFEKMDIGGLLDAMLSKNMMHDAQLVT 798 

R + E+ YVFD + + +P DGEVAGF + + +L + K+ + LV 
Sbjct: 215 MEKRHWIQPELQYVFDLPVDDLVIPRINDGEVAGFSLLPLNQVLHELELKSFKPNCALVL 274 

Query: 7 99 LDAFYRYGLIDAAHP 843 

LD R+G+I HP 
Sbjct: 275 LDFLIRHGIITPQHP 289 

Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N.gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 49 

The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 
41 9>: 

1 ATGAATAGAC CCAAGCAACC CTTCTTCCGT CCCGAAGTCG CCGTTGCCCG 

51 CCAAACCAGC CTGACGGGTA AAGTGATTCT GACACGACCG TTGTCATTTT 

101 CCCTATGGAC GACATTTGCA TCGATATCTG CGTTATTGAT TATCCTGTTT 

151 TTGATATTTG GTAACTATAC GCGAAAGACA ACAGTGGAGG GACAAATTTT 

201 ACCTGCATCG GGCGTAATCA GGGTGTATGC ACCGgATACG rGkACAATTA 

251 CAGCGAAATT CGTGGAAGAT GGmsAAAAGG TTAAGGCTGG CGACAAGCTA 

301 TTTGCGCTTT CGACCTCACG TTTCGGCGCA GGAGGTAGCG TGCAGCAGCA 

351 GTTGAAAACG GAGGCAGTTT TGAAGAAAAC GTTGGCAGAA CAGGAACTGG 

401 GTCGTCTGAA GCTGATACAC GGGAATGAAA CGCGCAgCcT TAAAGCAACT 

451 GTCGAACGTT TGGAAAACCA GGAACTCCAT ATTTCGCAAC AGATAGACGG 

501 TCAGAAAAGG CGCATTAGAC TTGCGGAAGA AATGTTGCAG AAATATCGTT 

551 TCCTATCCGC . CAATGA 

This corresponds to the amino acid sequence <SEQ ID 420; ORF107>: 

1 MNRPKQPFFR PEVAVARQTS LTGKVILTRP LSFSLWTTFA SISALLIILF 

51 LIFGNYTRKT TVEGQILPAS GVIRVYAPDT XTITAKFVED GXKVKAGDKL 

101 FALSTSRFGA GGSVQQQLKT EAVLKKTLAE QELGRLKLIH GNETRSLKAT 

151 VERLENQELH ISQQIDGQKR RIRLAEEMLQ KYRFLSXQ* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF107 shows 97.8% identity over a 186aa overlap with an ORF (ORF107a) from strain A ofK 
meningitidis: 

10 20 30 40 50 60 
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MNRPKQPFFRPEVAVARQTSLTGKVI LTRPLS FSLWTTFAS I SALLI I LFLI FGNYTRKT 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
MNRPKQPFFRPEVAVARQTSLTGKVILTRPLS FSLWTTFAS I SALLI I LFLI FGNYTRKT 
10 20 30 40 50 60 

70 80 90 100 110 120 

T VEGQ I L PAS GV I RVYAPDTXT I T AK FVE DGXKVKAG DKL FAL S TS RFGAGG S VQQQLKT 
I I I I I I I I I I I I I I I I I I I I I I I I I I ill I I 1 I I I I I I I I I I I I I I I I I I I I I I I I 
TVEGQILPASGVIRVYAPDTGTITAKFXEDGEKVKAGDKLFALSTSRFGAGDSVQQQLKT 
70 80 90 100 110 120 

130 140 150 160 170 180 

EAVLKKTLAEQELGRLKLIHGNETRSLKATVERLENQELHISQQIDGQKRRIRLAEEMLQ 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
EAVLKKTLAEQELGRLKLIHGNETRSLKATVERLENQELHISQQIDGQKRRIRLAEEMLQ 
130 140 150 160 170 180 

189 

KYRFLSXQX 
I I I I I I 

KYRFLSANDAVPKQEMMNVKAELLEQKAKLDAYRREEVGLLQEIRTQNLTLXSLPQAAX 
190 200 210 220 230 

The complete length ORF107a nucleotide sequence <SEQ ID 421> is: 

1 ATGAATAGAC CCAAGCAACC NTTCTTCCGT CCCGAAGTCG CCGTTGCCCG 

51 CCAAACCAGC CTGACGGGTA AAGTGATTCT GACACGACCG TTGTCATTTT 

101 CCCTATGGAC GACATTTGCA TCGATATCTG CGTTATTGAT TATCCTGTTT 

151 TTGATATTTG GTAACTATAC GCGAAAGACA ACAGTGGAGG GACAAATTTT 

201 ACCTGCATCG GGCGTAATCA GGGTGTATGC ACCGGATACG GGGACAATTA 

251 CNGCGAAATT CNTGGAAGAT GGAGAAAAGG TTAAGGCTGG CGACAAGCTA 

301 TTTGCGCTTT CGACCTCACG TTTCGGCGCA GGAGATAGCG TGCAGCAGCA 

351 GTTGAAAACG GAGGCAGTTT TGAAGAAAAC GTTGGCAGAA CAGGAACTGG 

401 GTCGTCTGAA GCTGATACAC GGGAATGAAA CGCGCAGCCT TAAAGCAACT 

451 GTCGAACGTT TGGAAAACCA GGAACTCCAT ATTTCGCAAC AGATAGACGG 

501 TCAGAAAAGG CGCATTAGAC TTGCGGAAGA AATGTTGCAG AAATATCGTT 

551 TCCTATCCGC CAATGATGCA GTGCCAAAAC AAGAAATGAT GAATGTCAAG 

601 GCAGAGCTTT TAGAGCAGAA AGCCAAACTT GATGCCTACC GCCGAGAAGA 

651 AGTCGGGCTG CTTCAGGAAA TCCGCACGCA GAATCTGACA TTGGNNAGCC 

701 TCCCCCAAGC GGCATGA 

This encodes a protein having amino acid sequence <SEQ ID 422>: 

1 MNRPKQPFFR PEVAVARQTS LTGKVILTRP LSFSLWT TFA SISALLIILF 

51 LI FG NYTRKT TVEGQILPAS GVIRVYAPDT GTITAKFXED GEKVKAGDKL 

101 FALSTSRFGA GDSVQQQLKT EAVLKKTLAE QELGRLKLIH GNETRSLKAT 

151 VERLENQELH ISQQIDGQKR RIRLAEEMLQ KYRFLSANDA VPKQEMMNVK 

201 AELLEQKAKL DAYRREEVGL LQEIRTQNLT LXSLPQAA* 

Homology with a predicted ORF from N. gonorrhoeae 

ORF107 shows 95.7% identity over a 188aa overlap with a predicted ORF (ORF107.ng) from N. 
gonorrhoeae: 

Orfl07.pep MNRPKQPFFRPEVAVARQTSLTGKV I LTRPLS FSLWTTFAS I SALLI I LFLI FGNYTRKT 60 

I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 1 I 
orfl07ng MNRPKQPFFRPEVAIARQTSLTGKV I LTRPLS FSLWTTFAS I SALLI I LFLI FGNYTRKT 60 

orf 107 . pep TVEGQILPASGVIRVYAPDTXTITAKFVEDGXKVKAGDKLFALSTSRFGAGGSVQQQLKT 120 

I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 
orfl07ng TMEGQILPASGVIRVYAPDTGTITAKFVEDGEKVKAGDKLFALSTSRFGAGGSVQQQLKT 120 

orf 107 . pep EAVLKKTLAEQELGRLKLIHGNETRSLKATVERLENQELHISQQIDGQKRRIRLAEEMLQ 180 

I I I I I I I I 1 I I I I I II I ! I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I : 
or f 1 0 7 ng EAVLKKTLAEQELGRLKLIHENETRSLKAT VERLENQKLHI SQQ I DGQKRR IRLAEEMLR 180 



orf 107 .pep 
orfl07a 

orf 107 .pep 
orfl07a 

orf 107. pep 
orfl07a 

orf 107 .pep 
orfl07a 



orf 107 .pep 
orfl07ng 



KYRFLSXQ 188 

II I II I I 
KYRFLSAQ 188 
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The complete length ORF107ng nucleotide sequence <SEQ ID 423> is predicted to encode a 
protein having amino acid sequence <SEQ ID 424>: 

1 MNRPKQPFFR PEVAIARQTS LTGKVILTRP LSFSLWT TFA SISALLIILF 

51 LIFG NYTRKT TMEGQILPAS GVIRVYAPDT GTITAKFVED GEKVKAGDKL 

101 FALSTSRFGA GGSVQQQLKT EAVLKKTLAE QELGRLKLIH ENETRSLKAT 

151 VERLENQKLH ISQQIDGQKR RIRLAEEMLR KYRFLSAQ* 

Based on the presence of a putative ransmembrane domain in the gonococcal protein, it is predicted 
that the proteins from N.meningitidis and N.gonorrhoeae, and their epitopes, could be useful 
antigens for vaccines or diagnostics, or for raising antibodies. 

Example 50 

The following DNA sequence, believed to be complete, was identified in N.meningitidis <SEQ ID 
425>: 

1 ATGCTGAATA CTTTTTTTGC CGTATTGGGC GGCTGCCTGC TGCT.TTGCC 

51 GTGCGGCAAA TCCGTAAATA CGGCGGTACA GCCGCAAAAC GCGGTACAAA 

101 GCGCGCCGAA ACCGGTTTTC AAAGTCATAT ATATCGACAA TACGGCGATT 

151 GCCGGTTTGG ATTTGGGACA AAGCAGCGAA GGCAAAACCA ACGACGGCAA 

201 AAAACAAATC AGTTATCCGA TTAAAGGCTT GCCGGAACAA AATGTTATCC 

251 GACTGATCGG CAAGCATCCC GGCGACTTGG AAGCCGTCAG CGGCAAATGT 

301 ATGGAAACCG ATGATAAGGA CAGTCCGGCA GGTTGGGCAG AAAACGGCGT 

351 GTGCCATACC TTGTTTGCCA AACTGGTGGG CAATATCGCC GAAGACGGCG 

401 GCAAACTGAC GGATTACCTA GTTTCGCATG CCGCCCTGCA ACCCTATCAG 

451 GCAGGCAAAA GCGGCTATGC CGCCGTGCAG AACGGACGCT ATGTGCTGGA 

501 AATCGACAGC GAAGGGGCGT TTTATTTCCG CCGCCGCCAT TATTGA 

This corresponds to the amino acid sequence <SEQ ID 426; ORF108>: 

1 MLNTFFAVLG GCLLXLPCGK SVNTAVQPQN AVQSAPKPVF KVIYIDNTAI 

51 AGLDLGQSSE GKTNDGKKQI SYPIKGLPEQ NVIRLIGKHP GDLEAVSGKC 

101 METDDKDSPA GWAENGVCHT LFAKLVGNIA EDGGKLTDYL VSHAALQPYQ 

151 AGKSGYAAVQ NGRYVLEIDS EGAFYFRRRH Y* 

Further work revealed the following DNA sequence <SEQ ID 427>: 

1 ATGCTGAAAA CATCTTTTGC CGTATTGGGC GGCTGCCTGC TGCTTGCCGC 

51 GTGCGGCAAA TCCGAAAATA CGGCGGAACA GCCGCAAAAC GCGGTACAAA 

101 GCGCGCCGAA ACCGGTTTTC AAAGTCAAAT ATATCGACAA TACGGCGATT 

151 GCCGGTTTGG ATTTGGGACA AAGCAGCGAA GGCAAAACCA ACGACGGCAA 

201 AAAACAAATC AGTTATCCGA TTAAAGGCTT GCCGGAACAA AATGTTATCC 

251 GACTGATCGG CAAGCATCCC GGCGACTTGG AAGCCGTCAG CGGCAAATGT 

301 ATGGAAACCG ATGATAAGGA CAGTCCGGCA GGTTGGGCAG AAAACGGCGT 

351 GTGCCATACC TTGTTTGCCA AACTGGTGGG CAATATCGCC GAAGACGGCG 

401 GCAAACTGAC GGATTACCTA GTTTCGCATG CCGCCCTGCA ACCCTATCAG 

451 GCAGGCAAAA GCGGCTATGC CGCCGTGCAG AACGGACGCT ATGTGCTGGA 

501 AATCGACAGC GAAGGGGCGT TTTATTTCCG CCGCCGCCAT TATTGA 

This corresponds to the amino acid sequence <SEQ ID 428; ORF108-1>: 



1 MLKTSFAVLG GCLLLAA CGK SENTAEQPQN AVQSAPKPVF KVKYIDNTAI 

51 AGLDLGQSSE GKTNDGKKQI SYPIKGLPEQ NVIRLIGKHP GDLEAVSGKC 

101 METDDKDSPA GWAENGVCHT LFAKLVGNIA EDGGKLTDYL VSHAALQPYQ 

151 AGKSGYAAVQ NGRYVLEIDS EGAFYFRRRH Y* 



Computer analysis of this amino acid sequence gave the following results: 
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10 



15 



Homology with a predicted ORF from N. gonorrhoeae 

ORF108 shows 88.4% identity over a 181aa overlap with a predicted ORF (ORF108.ng) from N. 
gonorrhoeae: 

orf 108 . pep MLNTFFAVLGGCLLXLPCGKSVNTAVQPQNAVQSAPKPVFKVIYIDNTAIAGLDLGQSSE 60 

II: I I I I I i I II I I I I I I I I I I I I : I I I I I I II I I I I I I I II I I I I I I I I I 
orfl08ng MLKIPFAVLGGCLLLAACGKSENTAEQPQNAAQSAPKPVFKVKYIDNTAIAGLALGQSSE 60 

orf 108 . pep GKTNDGKKQISYPIKGLPEQNVIRLIGKHPGDLEAVSGKCMETDDKDSPAGWAENGVCHT 120 

I I I I I I I I I I I I I I I I I I I I I : : I I I I I I : I I I I I I I I I I I I I I : I : I I I I I I I I I I 
orfl08ng GKTNDGKKQISYPIKGLPEQNAVRLTGKHPNDLEAWGKCMETDGKDAPSGWAENGVCHT 120 

orf 108 . pep LFAKLVGNIAEDGGKLTDYLVSHAALQPYQAGKSGYAAVQNGRYVLEIDSEGAFYFRRRHY 181 

I I I I I I I I I I I I I I I I I I I I : I I : I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 
orfl08ng LFAKLVGN I AEDGGKLTDYLI SHSALQPYQAGKSGYAAVQNGRYVLE I DSEGAFYFRRRHY 181 

ORF108-1 shows 923% identity with ORF108ng over the same 181 aa overlap: 



orf 108-1 . pep MLKTSFAVLGGCLLLAACGKSENTAEQPQNAVQSAPKPVFKVKYIDNTAIAGLDLGQSSE 60 

III I I I I I I I I I I I II M M I I I I I II I I : I I I I M I I I I I I I I I I I I II I llllll 
orfl08ng-l MLKIPFAVLGGCLLLAACGKSENTAEQPQNAAQSAPKPVFKVKYIDNTAIAGLALGQSSE 60 

20 

orf 108-1 .pep GKTNDGKKQISYPIKGLPEQNVIRLIGKHPGDLEAVSGKCMETDDKDSPAGWAENGVCHT 120 

I I I I I I M I I I I I I I M I I I I :: I I IIIMIMII 1(1 MM M : I : I I I I I I I II I 
orfl08ng-l GKTNDGKKQISYPIKGLPEQNAVRLTGKHPNDLEAWGKCMETDGKDAPSGWAENGVCHT 120 

25 orf 108-1 .pep LFAKLVGN IAEDGGKLTDYLVSHAALQPYQAGKSGYAAVQNGRYVLE I DSEGAFYFRRRHY 181 

I I I I I I I II II I I I I I I I I I : I I : I I I I II II I I I I I I II I I I I I I I I I II I I I M I I I I I 
orfl08ng-l LFAKLVGNI AEDGGKLTDYLISHSALQPYQAGKSGYAAVQNGRYVLEIDSEGAFYFRRRHY 181 

The complete length ORF108ng nucleotide sequence <SEQ ID 429> is: 



1 ATGCTGAAAa tacctTTTGC CGTGTtgggc ggCtgcctGC TGCTTGCCGC 

30 51 CTGCGGCAAA TCCGAAAATa cggcggaACA GCCGCAAAAT gcggCACAAA 

101 GCGCGCCGAA ACCGGTTTTC AAAGTCAAAT ACATCGACAA TACGGCGATT 

151 GCCGGTTTGG CTTTGGGACA AAGTAGCGAA GGCAAAACCA acgacgGCAA 

201 AAAACAAATC AGTTATccgA TTAAAGGCTT GCCGGAACAA AacgccgtCC 

251 gGCTGACCGG AAAGCATCCC AACGACTTGG AagccgtcgT CGGCAAATGT 

35 301 ATGGAAACCG ACGGAAAGGA CGCGCCTTCG GGCTGGGCGG AAAACGGCGT 

351 GTGCCATACC TTGTTTGCCA AACTGGTGGG CAATATCGCC GAAGACGGCG 

401 GCAAACTGAC TGATTACCTG ATTTCGCATT CCGCCCTGCA ACCCTATCAG 

451 GCAGGCAAAA GCGGCTATGC CGCCGTGCAG AACGGACGCT ATGTGCTGGA 

501 AATCGACAGC GagggGGCGT TTTATttccg ccgccgccat tattgA 

40 This encodes a protein having amino acid sequence <SEQ ID 430>: 



1 MLKIPFA VLG GCLLLAAC GK SENTAEQPQN AAQSAPKPVF KVKYIDNTAI 

51 AGLAL GOSSE GKT NDGKKQI SYPIKGLPEQ NAVRLTGKHP NDLEAWGKC 

101 METDGKDAPS GWAENGVCHT LFAKLVGN I A EDGGKLTDYL ISHSALQPYQ 

151 AGKSGYAAVQ NGRYVLEIDS EGAFYFRRRH Y* 

45 Based on this analysis, including the presence of a predicted prokaryotic membrane lipoprotein 
lipid attachment site (underlined) and a putative ATP/GTP-binding site motif A (P-loop, double- 
underlined) in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



50 Example 51 

The following DNA sequence was identified in ^meningitidis <SEQ ID 43 1>: 
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1 ATGGAAGATT TATATATAAT ACTCGCTTTG GGTTTGGTTG CGATGATTGC 

51 CGgATTTATC GATgcgatTg cGggCGGGGG TGGTTTGATT ACGCTGCCCG 

101 CACTCTTGTT GGCAGGTATT CCTCCCGTGT CGGCAATTGC CACCAACAAG 

151 CTGCAAgCAG CCGCTGCTAC GTTTTCAGCT ACGGTTTCTT TTGCACGCAA 

201 AGGTTTGATT GATTGGAAGA AAGGTCTCCC GATTGCCGCA GCATCGTTTG 

251 TAGGCGGCGT GGcCGGTGCA TTATCGGTCA GCTTGGTTTC CAAAGATATT 

301 CTgCTgGCGG TCGTGCCGGT TTTGTTGATA TTTGTCGCAC TGTATTTTGT 

351 GTTTTCGCCC AAGCTCGACG GCAGTAAGGA AGGCAAAGCC AGAATGTCTT 

401 TTTTTCTGTT cGGGCTGACG GTCGC.ACCG CTTTTGGGTT TTTACGACGG 

451 TGTGTTCGGA CCGGGTGTCG GCTCGTTTTT TCTGATTGCC TTTATTGTTT 

501 TGCTCGGCTG CAAgCTGTTG AACGCGATGT CTTACACCAA ATTGGCGAAC 

551 GTTGCCTGCA ATCTTGGTTC GCTATCGGTA TTCCTGCTGC ACGGTTCGAT 

601 TATTTTCCCG ATTGCGGCAA CGaTGGCGGT CGGTGCGTTT GTCGGtGCGA 

651 ATTTAgGTGC GAGATTTGCC GTaCgctTCG GTTCGAAGCT GATTAA 

This corresponds to the amino acid sequence <SEQ ID 432; ORF109>: 



1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIATNK 

51 LQAAAATFSA TVSFARKGLI DWKKGLPIAA ASFVGGVAGA LSVSLVSKDI 

101 LLAWPVLLI FVALYFVFSP KLDGSKEGKA RMSFFLFGLT VXTAFGFLRR 

151 CVRTGCRLVF SDCLYCFARL QAVERDVLHQ IGERCLQSWF AIGIPAARFD 

201 YFPDCGNDGG RCVCRCEFRC EICRTLRFEA D* 

Further work revealed the following DNA sequence <SEQ ID 43 3>: 



1 ATGGAAGATT TATATATAAT ACTCGCTTTG GGTTTGGTTG CGATGATTGC 

51 CGGATTTATC GATGCGATTG CGGGCGGGGG TGGTTTGATT ACGCTGCCCG 

101 CACTCTTGTT GGCAGGTATT CCTCCCGTGT CGGCAATTGC CACCAACAAG 

151 CTGCAAGCAG CCGCTGCTAC GTTTTCAGCT ACGGTTTCTT TTGCACGCAA 

201 AGGTTTGATT GATTGGAAGA AAGGTCTCCC GATTGCCGCA GCATCGTTTG 

251 TAGGCGGCGT GGCCGGTGCA TTATCGGTCA GCTTGGTTTC CAAAGATATT 

301 CTGCTGGCGG TCGTGCCGGT TTTGTTGATA TTTGTCGCAC TGTATTTTGT 

351 GTTTTCGCCC AAGCTCGACG GCAGTAAGGA AGGCAAAGCC AGAATGTCTT 

401 TTTTTCTGTT CGGGCTGACG GTCGCACCGC TTTTGGGTTT TTACGACGGT 

451 GTGTTCGGAC CGGGTGTCGG CTCGTTTTTT CTGATTGCCT TTATTGTTTT 

501 GCTCGGCTGC AAGCTGTTGA ACGCGATGTC TTACACCAAA TTGGCGAACG 

551 TTGCCTGCAA TCTTGGTTCG CTATCGGTAT TCCTGCTGCA CGGTTCGATT 

601 ATTTTCCCGA TTGCGGCAAC GATGGCGGTC GGTGCGTTTG TCGGTGCGAA 

651 TTTAGGTGCG AGATTTGCCG TCCGCTTCGG TTCGAAGCTG ATTAAGCCGC 

701 TGCTGATTGT CATCAGCATT TCGATGGCTG TGAAATTGTT GATAGACGAG 

751 AGAAATCCGC TGTATCAGAT GATTGTTTCG ATGTTTTAA 

This corresponds to the amino acid sequence <SEQ ID 434; ORF109-1>: 



1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIA TNK 

51 LQAAAATFSA TVSFARKGLI DWKKGLPI AA ASFVGGVAGA LSVSLV SKDI 

101 LLAWPVLLI FVALYF VFSP KLDGSKEGKA R MSFFLFGLT VAPLLGFY DG 

151 VFGPG VGSFF LIAFIVLLGC KL LNAMSYTK LANVACNLGS LSVFLLHGSI 

201 IFPIAATMAV GAFVGAN LGA RFAVRFGSKL IK PLLIVISI SMAVKLLID E 

251 RNPLYQMIVS MF* " 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF109 shows 95.9% identity over a 147aa overlap with an ORF (ORF109a) from strain A ofK 
meningitidis: 

10 20 30 40 50 60 

orf 109, pep MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 
MM I I! I I IN Ml IIM1I I Mil I II I I II I II II I I I I I II I Mill I M I III II 
orfl09a ME DLY 1 1 LALGLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIATNKLQAAAATFS A 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 109 . pep TVS FARKGLIDWKKGLPIAAASFVGGVAGALSVSLVSKDI LLAWPVLLI FVALYFVFSP 
I I I I I I I I I I M I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 109a TVSFARKGLI DWKKGLPIAAASFAGGWGALSVSLVSKDILLAWPVLLI FVALYFVFSP 

70 80 90 100 110 120 
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130 140 150 160 170 180 

orf 109 . pep KLDGSKEGBCARMSFFLFGLTVXTAFGFLRRCVRTGCRLVFSDCLYCFARLQAVERDVLHQ 

I I I I I I I I I I I I I I I I I I I I I : I I 
orf 109a KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 

130 140 150 160 170 180 

The complete length ORF 109a nucleotide sequence <SEQ ID 43 5> is: 

1 ATGGAAGATT TATACATAAT ACTCGCTTTG GGTTTGGTTG CGATGATTGC 

51 CGGATTTATC GATGCGATTG CGGGTGGGGG TGGTTTGATT ACGCTGCCTG 

101 CACTCTTGTT GGCAGGTATT CCTCCCGTGT CGGCAATTGC CACCAACAAG 

151 CTGCAAGCAG CCGCTGCTAC GTTTTCGGCT ACGGTTTCTT TTGCACGCAA 

201 AGGTTTGATT GATTGGAAGA AAGGTCTCCC GATTGCGGCA GCATCGTTTG 

251 CAGGCGGCGT GGTCGGTGCA TTATCGGTCA GCTTGGTTTC CAAAGATATT 

301 CTGCTGGCGG TCGTGCCGGT TTTGTTGATA TTTGTCGCGC TGTATTTTGT 

351 GTTTTCGCCC AAGCTCGACG GCAGTAAGGA AGGCAAAGCC AGAATGTCTT 

401 TTTTTCTGTT CGGTCTGACG GTTGCACCAC TTTTGGGTTT TTACGACGGT 

451 GTGTTCGGAC CGGGTGTCGG CTCGTTTTTT CTGATTGCCT TTATTGTTTT 

501 GCTCGGCTGC AAGCTGTTGA ACGCGATGTC TTACACCAAA TTGGCGAACG 

551 TTGCCTGCAA TCTTGGTTCG CTATCGGTAT TCCTGCTGCA CGGTTCGATT 

601 ATTTTCCCGA TTGCGGCAAC GATGGCGGTC GGTGCGTTTG TCGGTGCGAA 

651 TTTAGGTGCG AGATTTGCCG TCCGCTTCGG TTCGAAGCTG ATTAAGCCGC 

701 TGCTGATTGT CATCAGCATT TCGATGGCTG TGAAATTGTT GATAGACGAG 

751 AGAAATCCGC TGTATCAGAT GATTGTTTCG ATGTTTTAA 

This encodes a protein having amino acid sequence <SEQ ID 436>: 

1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIA TNK 

51 LQAAAATFSA TVSFARKGLI DWKKGLPIA A ASFAGGWGA LSVSLV SKDI 

101 LLAWPVLLI FVALYF VFSP KLDGSKEGKA RMSFFLFGLT VAPLLGFY DG 

151 VFGPG VGSFF LIAFIVLLGC KL LNAMSYTK LANVACNLGS LSVFLLHGSI 

201 IFPIAATMAV GAFVGA NLGA RFAVRFGSKL IK PLLIVISI SMAVKLLID E 

251 RNPLYQMIVS MF* 

ORF109a and ORF109-1 show 99.2% identity in 262 aa overlap: 

10 20 30 40 50 60 

orf 109a . pep MEDLYI I LALGLVAMI AGFIDAI AGGGGLITLPALLLAGI PPVSAI ATNKLQAAAAT FSA 
I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II i I I I I I I I I I I I I I I I I 1 I 
orf 109-1 MEDLYI ILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 109a . pep TVSFARKGLI DWKKGLPIAAASFAGGVVGALSVSLVSKDI LLAWPVLLI FVALYFVFSP 
M I I I I I I I I I I I I I I I II I I I I : I I I : I I I I I I II I I I I I I i I I I I I I If II I I M ! I I 
orf 109-1 TVS FARKGLIDWKKGLPIAAASFVGGVAGALSVSLVSKDILLAWPVLLI FVALYFVFSP 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 109a . pep KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 
I I I I I I It I I I I I I I I I I I I I t I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 109-1 KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 109a . pep LANVACNLGSLSVFLLHGS 1 1 FPIAATMAVGAFVGANLGARFAVRFGSKL IKPLLIVISI 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I 1 I I I I I I 
orf 109-1 LANVACNLGSLSVFLLHGS 1 1 FPIAATMAVGAFVGANLGARFAVRFGSKLIKPLLIVI SI 

190 200 210 220 230 240 

250 260 
orf 109a. pep SMAVKLLIDERNPLYQMIVSMFX 

I I I I I I I I I I I I I I I I I I I I I I I 
orfl09-l SMAVKLLI DERNPLYQMIVSMFX 

250 260 
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Homology with a predicted ORF from N. gonorrhoeae 

ORF109 shows 98.3% identity over a 23 laa overlap with a predicted ORF (ORF109.ng) from N. 
gonorrhoeae: 

orfl09.pep MEDLYI ILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 60 

I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl09ng MEDLYI ILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 60 

orf 109 .pep TVSFARKGLIDWKKGLPIAAASFVGGVAGALSVSLVSKDILLAWPVLLIFVALYFVFSP 120 

I I I I I I I I I I I I I I I I I I I I I I I : I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl09ng TVS FARKGLIDWKKGLPI AAAS FAGGWGALSVSLVSKDILLAWPVLLI FVALYFVFS P 120 

orf 109 . pep KLDGSKEGKARMSFFLFGLTVXTAFGFLRRCVRTGCRLVFSDCLYCFARLQAVERDVLHQ 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl09ng KLDGSKEGKARMSFFLFGLTVATAFGFLRRCVRTGCRLVFSDCLYCFARLQAVERDVLHQ 180 

orf 109. pep IGERCLQSWFAIG I PAARFDYFPDCGNDGGRCVCRCEFRCE ICRTLRFE AD 231 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I II 
orfl09ng IGERCLQSWFAIGIPAARFDYFPDCGNDGGRCVCRCEFRCEICRPLRFEAD 231 

An ORF109ng nucleotide sequence <SEQ ID 437> was predicted to encode a protein having amino 
acid sequence <SEQ ID 43 8>: 

1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIA TNK 

51 LQAAAATFSA TVSFARKGLI DWKKGLPIA A ASFAGGWGA LSVSLV SKDI 

101 LLAWPVLLI FVALYF VFS P KLDGSKEGKA R MSFFLFGLT VATAFGFL RR 

151 CVRTGCRLVF SDCLYCFARL QAVERDVLHQ IGERCLQSWF AIGIPAARFD 

201 YFPDCGNDGG RCVCRCEFRC EICRPLRFEA D* 

Further work revealed the following gonococcal DNA sequence <SEQ ED 439>: 

1 ATGGAAGATT TATACATAAT ACTCGCTTTG GGTTTGGTTG CGATGATCGC 

51 CGGATTTATC GATGCGATTG CGGGCGGGGG TGGTTTGATT ACGCTGCCTG 

101 CACTCTTGTT GGCAGGTATT CCTCCCGTGT CGGCAATTGC CACCAACAAG 

151 CTGCAAGCAG CCGCTGCTAC GTTTTCGGCT ACGGTTTCTT TTGCACGCAA 

201 AGGTTTGATT GATTGGAAGA AAGGTCTCCC GATTGCCGCA GCATCGTTTG 

251 CAGGCGGCGT GGTCGGTGCA TTATCGGTCA GCTTGGTTTC CAAAGATATT 

301 TTGCTGGCGG TCGTGCCGGT TTTGTTGATA TTTGTCGCGC TGTATTTTGT 

351 GTTTTCGCCC AAGCTCGACG GCAGTAAGGA AGGCAAAGCC AGAATGTCTT 

401 TTTTTCTATT CGGGCTGACG GTTGCACCGC TTTTGGGTTT TTACGACGGT 

451 GTGTTCGGAC CGGGTGTCGG CTCGTTTTTT CTGATTGCCT TTATTGTTTT 

.501 GCTCGGCTGC AAGCTGTTGA ACGCGATGTC TTACACCAAA TTGGCGAACG 

551 TTGCTTGCAA TCTTGGTTCG CTATCGGTAT TCCTGCTGCA CGGTTCGATT 

601 ATTTTCCCGA TTGTGGCAAC GATGGCGGTC GGTGCGTTTG TCGGTGCGAA 

651 TTTAGGTGCG AGATTTGCCG TCCGCTTGGG TTCGAAGCTG ATTAAGCCGC 

701 TGCTGATTGT CATCAGCATT TCGATGGCTG TGAAATTGTT GATAGACGAG 

751 AGAAATCCGC TGTATCAGAT GATTGTTTCG ATGTTTTAA 

This corresponds to the amino acid sequence <SEQ ID 440; ORF109ng-l>: 

1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIA TNK 

51 LQAAAATFSA TVSFARKGLI DWKKGLPI AA ASFAGGWGA LSVSLV SKDI 

101 LLAWPVLLI FVALYF VFSP KLDGSKEGKA R MSFFLFGLT VAPLLGFY DG 

151 VFGPG VGSFF LIAFIVLLGC KL LNAMSYTK LANVACNLGS LSVFLLHGSI 

201 IFPIVATMAV GAFVGA NLGA RFAVRFGSKL I KPLLIVISI SMAVKLLID E 

251 RNPLYQMIVS MF* 

ORF109ng-l and ORF 109-1 show 98.9% identity in 262 aa overlap: 

10 20 30 40 50 60 

orf 109ng-l . pep MEDLYI ILALGLVAMIAGFIDAIAGGGGLITLPALLLAGI PPVSAIATNKLQAAAATFSA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
orf 109-1 MEDLYI ILALGLVAMIAGFIDAIAGGGGLITLPALLLAGI PPVSAIATNKLQAAAATFSA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orfl09ng-l.pep TVS FARKGL I DWKKGLPIAAAS FAGGWGALSVSLVSKDILLAWPVLLI FVALYFVFSP 
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I I Mill I I I I M I I ! II 11111:1 I 1:111 II I Ml I Ml I I I II I I I i I MMI1II I 
orf 109-1 TVSFARKGLIDWKKGLPIAAASFVGGVAGALSVSLVSKDILLAWPVLLIFVALYFVFSP 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 109ng-l . pep KLDGSKEGKARMSFFLFGLTVAPLLG FY DGVFGPGVGSFFLI AFI VLLGCKLLNAMS YTK 
IIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 
orfl09-l KLDGSKEGKARMS FFLFGLTVAPLLGFYDGVFGPGVGS FFLI AFI VLLGCKLLNAMS YTK 

130 140 150 160 170 180 

190 200 210 220 230 240 

orfl09ng-l.pep LAN VACN LGS L S V FLLHG S 1 1 FP I VATMAVGAFVG ANLGAR FAVRFG SKLIKPLLIVISI 
M M II M M M M M M M M I I: M M I M M M M M M M M M M M M M M M 
orf 109-1 LANVACNLGSLSVFLLHGSIIFPIAATMAVGAFVGANLGARFAVRFGSKLIKPLLIVISI 

190 200 210 220 230 240 

250 260 
orf 109ng-l .pep SMAVKLLIDERNPLYQMIVSMFX 
M M M M M M M M M I M M 
orf 10 9-1 SMAVKLLIDERNPLYQMIVSMFX 

250 260 

In addition, ORF109ng-l shows homology to a hypothetical Pseudomonas protein: 

sp|P29942|YCB9_PSEDE HYPOTHETICAL 27.4 KD PROTEIN IN COBO 3»REGION (ORF9) 
>gi 1 94984 IpirM 138164 hypothetical protein 9 - Pseudomonas sp >gi|551929 
(M62866) ORF9 [Pseudomonas denitrif icans] Length = 261 
Score = 175 bits (439), Expect = 3e-43 

Identities » 83/214 (38%), Positives - 131/214 (60%), Gaps = 1/214 (0%) 

Query: 41 PPVSAIATNKLQXXXXXXXXXXXXXRKGLIDWKKGLPIXXXXXXXXXXXXXXXXXXXKDI 100 

PP+ + TNKLQ R+G ++ K+ LP+ D+ 

Sbjct: 43 PPLQTLGTNKLQGLFGSGSATLSYARRGHVNLKEQLPMALMSAAGAVLGALLATIVPGDV 102 

Query: 101 LLAWPVLLIFVALYFVFSPKLDGSKEGKARMS FFLFGLTVAPLLGFYDGVFGPGVGS FF 160 

L A++P LLI +ALYF P + G + +R++ F+F LT+ PL+GFYDGVFGPG GSFF 
Sbjct: 103 LKAILPFLLIAIALYFGLKPNM-GDVDQHSRVTPFVFTLTLVPLIGFYDGVFGPGTGSFF 161 

Query: 161 LI AFI VLLGCKLLNAMS YTKLANVACNLGSLSVFLLHGS 1 1 FPIVATMAVGAFVGANLGA 220 

++ F+ L G +L A ++TK N N+G+ VFL G++++ + M +G F+GA +G+ 
Sbjct: 162 MLGFVTLAGFGVLKATAHTKFLNFGSNVGAFGVFLFFGAVLWKVGLLMGLGQFLGAQVGS 221 

Query: 221 RFAVRFGSKLIKPLLIVISISMAVKLLIDERNPL 254 

R+A+ G+K+IKPLL+++SI++A++LL D +PL 
Sbjct: 222 RYAMAKGAKIIKPLLVIVSIALAIRLLADPTHPL 255 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N.gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 52 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 441>: 

1 . . CTGCTAGGGT ATTGCATCGG TTATCGGTAC GGCTGTTGCA GCAAAACCAG 

51 CCGCAGACGG ATTATTTGGT CAAATTCGGA TCGTTTTGGG CGAG.ATTTT 

101 TGGTTTTCTG GGACTGTATG ACGTCTATGC TTCGGCATGG TTTGTCGTTA 

151 TCATGATGTT TTTGGTGGTT TCTACCAGTT TGTGCCTGAT TCGCAATGTG 

201 CCGCCGTTCT GGCGCGAAAT GAAGTCTTTT CGGGAAAAGG TTAAAGAAAA 

251 ATCTCTGGCG GCGATGCGCC ATTCTTCGCT GTTGGATGTA AAAATTGCGC 

301 CCGAGGTTGC CAAACGTTAT CTGGAAGTAC AAGGTTTTCA GGGGAAAACC 

351 ATTAACCGTG AAGACGGGTC GGTTCTGATT GCCGCCAAAA AAGGCACAAT 

401 GAACAAATGG GGCTATATCT TTGCCCATGT TGCTTTGATT GTCATTTGCC 

451 TGGGCGGGTT GATAGACAGT AACCTGCTGT TGAAACTGGG TATGCTGACC 

501 GGTCGGATTG TTCCGGACAA TCAGGCGGTT TATGCCAAGG ATTTC.AAGC 



WO 99/24578 



-266- 



PCT/IB98/01665 



10 



551 CCGAAAGTAT . TTTGGGTGC gTCCAATCTC TCATTTAGGG GCAACGTCAA 
601 TATTTCCG . A GGGGCAGAgT GCGGATGTGG TTTTCCTGA 

This corresponds to the amino acid sequence <SEQ ID 442; ORF1 10>: 

1 . . LLGIASVIGT LLQQNQPQTD YLVKFGSFWA XIFGFLGLYD VYASAWFWI 

51 MMFLWSTSL CLIRNVPPFW REMKSFREKV KEKSLAAMRH SSLLDVKIAP 

101 EVAKRYLEVQ GFQGKTINRE DGSVLIAAKK GTMNKWGYIF AHVALIVICL 

151 GGLIDSNLLL KLGMLTGRIF RTIRRFMPRI XKPESXFGCV QSLI*GQRQY 

201 FXRGRVRMWF S* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with ORF88a from N. meningitidis (strain A) 

ORF1 10 shows 91.5% identity over a 188aa overlap with ORF88a from strain A o£N. meningitidis: 



15 



20 



25 



30 



35 



40 



orf 88a. pep 
orfllO 

orf 88a. pep 
orfllO 

orf88a.pep 
orfllO 

orf 88a. pep 
orfllO 

orf 88a .pep 
orfllO 



10 20 30 40 50 60 

MSKSRRSPPLLSRPWFAFFSSMRF AVALLSLLGIASVIGTVL QQNQPQTDYLVKFGSFWA 

I i I I I I I I I I : I I I I I I I I I I I I I I I I I I I 
LLG I AS V I GTLL QQNQPQTD YLVKFGSFWA 
10 20 30 

70 80 90 100 110 120 

QIFGFLGLYDVYASAW FWIMMFLWSTSLCLI RNVPPFWREMKSFREKVKEKSLAAMRH 

III I I II II III II I I I Ml Ml MM I I I II I I I I II IN M I I I M I I II M IN M 
XIFGFLGLYDVYASAW FVVIMMFLVVSTSLCLI RNVPPFWREMKSFREKVKEKSLAAMRH 
40 50 60 70 80 90 

130 140 150 160 170 180 

SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWG YIFAHVALIVICL 

M I M M M M M M M M M M I I I M M I M 1 M I M M I M M M I I I M M M I I I 
SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWG YIFAHVALIVICL 
100 110 120 130 140 150 

190 200 210 220 230 240 

GGLI PSNLLLKLGMLTGRI VPDNQAVYAKDFKPES I LGASNLS FRGNVN I SEGQSADWF 
I M II II M M M M I I I I : : : M II M 

GGLI PSNLLLKLGMLTGRI FRT I RRFMPRIXKPESXFGCVQSLIXGQRQY FXRGRVRMWF 

200 



160 



170 



180 



190 



210 



250 260 270 280 290 300 

LNADNGILVQPLPFEVKLKKFHIPFYNTGMPRPFASDIEVTDKATGEKLERTIRVNHPLT 

SX 



However, ORF88 and ORF1 10 do not align, because they represent two different fragments of the 
same protein. 

Homology with a predicted ORF from N.gonorrhoeae 

ORF1 10 shows 88.6% identity over a 21 laa overlap with a predicted ORF (ORF1 lO.ng) from N. 



45 



50 



55 



gonorrhoeae: 

orf 110. pep 
orfllOng 
orf 110. pep 
orfllOng 
orf 110 .pep 
orfllOng 



LLGIASVIGTLLQQNQPQTDYLVKFGSFWA 
I I I I I I I 1 I I : I I I I I I I I II I I 1 I I II: 
MSKSRISPTLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTPYLVKFGPFWT 



30 



60 



90 



XIFGFLGLYDVYASAWFWIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 
II I I I I II II I I I I II II II II I I II II I II II I II I II II M M I I I I II M I M I I 
RIFDFLGLYDVYASAWFWIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 120 

SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 150 
II I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I M I I I I I I I 1 I I I I I I I I I I I I I I I 
SSLLDVKIAPEVAKRYLEVRGFQGKTVSREDGSVLIAAKKGTMNKWGYIXAHVALIVICL 180 
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orf 110 . pep GGLIDSNLLLKLGMLTGRIFRTIRRFMPRIXKPESXFGCVQSLIXGQRQYFXRGRVRMWF 210 

I II: 111111111:1 Mh II 1 I I I I I I I : I I I I I I I I I I I I I I : I I I I I 
orfllOng GRLINXNLLLKLGMLAGSIFRNNRRVMPRISKPESIWGGVQSLIKGQRQYFQRGKVRMWF 240 

orfllO.pep S 211 
I 

orfllOng S 241 

The complete length ORFllOng nucleotide sequence <SEQ ID 443> is predicted to encode a 
protein having amino acid sequence <SEQ ID 444>: 

1 MSKSRISPTL LSRPW FAFFS SMRF AVALLS LLGIASVIGT VL QQNQPQTD 

51 YLVKFGPFWT RIFDFLGLYD VYASAW FWI MMFLWSTSL CLI RNVPPFW 

101 REMKSFREKV KEKSLAAMRH SSLLDVKIAP EVAKRYLEVR GFQGKTVSRE 

151 DGSVLIAAKK GTMNKWGYIX AHVALIVICL GRLINXN LLL KLGMLAGSIF 

201 RNNRRVMPRI SKPESIWGGV QSLIKGQRQY FQRGKVRMWF S* 

Based on the putative transmembrane domains in the gonococcal protein, it is predicted that the 
proteins from meningitidis and ^.gonorrhoeae^ and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 

Example 53 

The following DNA sequence was identified in N. meningitidis <SEQ ID 445>: 

1 ATGCCGTCTG AAACACGCCT GCCGAACTTT ATCCGCGTCT TGATATTTGC 

51 CCTGGGTTTC ATCTTCCTGA ACGCCTGTTC GGAACAAACC GCGCAAACCG 

101 TTACCCTGCA AGGCGAAACG ATGGGCACGA CCTATACCGT CAAATACCTT 

151 TCAAATAATC GGGACAAACT CCCCTCACCT GCCGAAATAC AAAAACGCAT 

201 CGATGACGCG CTTAAAGAAG TCAACCGGCA GATGTCCACC TATCAGCCCG 

251 ACTCCGAAAT CAGCCGGTTC AACCAACACA CAGCCGGCAA GCCCCTCCGC 

301 ATTTCAAGCG ACTTCGCACA CGTTACTGCC GAAGCCGTCC GCCTGAACCG 

351 CCTGACACAC GGCGCGCTGG ACGTAACCGT CGGCCCCTTG GTCAACCTTT 

401 GGGGATTCGG CCCCGACAAA TCCGTTACCC GTGAACCGTC GCCGGAACAA 

451 ATCAAACAGG CGGCATCTTA TACGGGCATA GACAAAATCA TTTTGAAACA 

501 AGGCAAAGAT TACGCTTCCT TGAGCAAAAC CCACCCCAAG GCCTATTTGG 

551 ATTTATCTTC GATTGCCAAA GGCTTCGGCG TTGATAAAGT TGCGGGCGAA 

601 CTGGAAAAAT ACGGCATTCA AAATTATCTG GTCGAAATCG GCGGCGAGTT 

651 GCACGGCAAA GGCAAAAACG CGCGCGGCGA ACCGTGGCGC ATCGGTATCG 

701 AGCAGCCCAA TATCGTCCAA GGCGGCAATA CGCAGATTAT CGTCCCGCTG 

751 AACAACCGTT CGCTTGCCAC TTCCGGCGAT TACCGTATTT TCCACGTCGA 

801 TAAAAACGGC AAACGCCTCT CCCATATCAT CAACCCGAAC AACAAACGAC 

851 CCATCAGCCA CAACCTCGCC TCCATCAGCG TGGTCGCAGA CAGTGCGATG 

901 ACGGCGGACG GCTTGTCCAC AGGATTATTC GTATTGGGCG AAACCGAAGC 

951 CTTAAAGCTG GCAGAGCGCG AAAAACTCGC TGTTTTCCTG ATTGTCAGGG 

1001 ATAAAGGCGG CTACCGCACC GCCATGTCTT CCGAATTTGA AAAACTGCTC 

1051 CGCTAA 

This corresponds to the amino acid sequence <SEQ ID 446; ORF1 1 1>: 

1 MPSETRLPNF IRVLIFALGF IFLNA CSEQT AQTVTLQGET MGTTYTVKYL 

51 SNNRDKLPSP AEIQKRIDDA LKEVNRQMST YQPDSEISRF NQHTAGKPLR 

101 ISSDFAHVTA EAVRLNRLTH GAL DVT VG PL VNLWGFGPDK SVTREPSPEQ 

151 IKQAASYTGI DKIILKQGKD YASLSKTHPK AYLDLSSIAK GFGVDKVAGE 

201 LEKYGIQNYL VEIGGELHGK GKNARGEPWR IGIEQPNIVQ GGNTQIIVPL 

251 NNRSLATSGD YRIFHVDKNG KRLSHIINPN NKRPISHNLA SISWADSAM 

301 TADGLSTGLF VLGETEALKL AEREKLAVFL IVRDKGGYRT AMSSEFEKLL 

351 R* 

Computer analysis of this amino acid sequence gave the following results: 
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Homologv with a predicted ORF from N. meningitidis (strain A) 

ORF 1 1 1 shows 96.9% identity over a 35 1 aa overlap with an ORF (ORF1 1 la) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

MPSETRLPNFIRTLIFALSFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDXLPSP 
I I I t I I I I I I I 1 : M I I I : I II I I I I I I I I I I I t ! I M I II I I I M i 1 I II I I I I I I I I 
MPSETRLPNFIRVLIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSP 
10 20 30 40 50 60 

70 80 90 100 110 120 

AEIQXRIDDALKEVNRQMSTYQPDSEISRFNQHTAGKPLRISSDFAHVTAEAVHLNRLTH 

MM II I M II I II M M II M M II M M M M M M M M M I I M M M M M II I 
AEIQKRIDDALKEVNRQMSTYQPDSEISRFNQHTAGKPLRISSDFAHVTAEAVRLNRLTH 
70 80 90 100 110 120 



orfllla.pep 
orflll 

orfllla.pep 
orflll 



130 140 150 160 170 180 

orf 111a . pep GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILKQGKDYASLSKTHPK 
I II I I II I II I I I I I II I I I I I I I I I I I ! I I I II I I I M M I I I I I M I I I I I I I M M I 
orflll GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILKQGKDYASLSKTHPK 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 111a . pep AYLDLSSIAKGFGVDXVAGELEKYGIQNYLVEIGGELHGKXKNARGEPWRIGIEQPNIVQ 
I t I I I I I I I I I I I t I M i II I I 1 I I I I I I II I II I I I I I I I I I M M M II I I I M I I 
orflll AYLDLSSIAKGFGVDKVAGELEKYGIQNYLVEIGGELHGKGKNARGEPWRIGIEQPNIVQ 

190 200 210 220 230 240 



250 260 270 280 290 300 

GGNTQIIVPLNNRSXATSGDYRIFHVDKSGKRLSHIINPNNKRPISHNLASISVXADSAM 

I I II I I II I II II I II I I I I I I I I I II : I I I II I I I I I I I I I I I I M II I I II I I I I I 
GGNTQIIVPLNNRSLATSGDYRIFHVDKNGKRLSHIINPNNKRPISHNLASISWADSAM 
250 260 270 280 290 300 

310 320 330 340 350 

TADGXSTGLFVLGETEALKLAEREKLAVFLIVRDKGGYRTAMSSEFEKLLRX 

I I II II II I II I I II I II I I I I I I II I I I M I I II I M I M II II I I I I I I 
TADGLSTGLFVLGETEALKLAEREKLAVFLIVRDKGGYRTAMSSEFEKLLRX 
310 320 330 340 350 

The complete length ORF1 1 la nucleotide sequence <SEQ ED 447> is: 

1 ATGCCGTCTG AAACACGCCT GCCGAACTTT ATCCGCACCT TGATATTTGC 

51 CCTGAGTTTT ATCTTCCTGA ACGCCTGTTC GGAACAAACC GCGCAAACCG 

101 TTACCCTGCA AGGTGAAACG ATGGGCACGA CCTATACCGT CAAATACCTT 

151 TCAAATAATC GGGACNAACT CCCNTCACCT GCCGAAATAC AAAANCGCAT 

201 CGATGACGCG CTTAAAGAAG TCAACCGGCA GATGTCCACC TATCAGCCCG 

251 ACTCCGAAAT CAGCCGGTTC AACCAACACA CAGCCGGCAA GCCCCTCCGC 

301 ATTTCAAGCG ACTTCGCACA CGTTACTGCC GAAGCCGTCC ACCTGAACCG 

351 CCTGACACAC GGCGCGCTGG ACGTAACCGT CGGCCCCTTG GTCAACCTTT 

401 GGGGATTCGG CCCCGACAAA TCCGTTACCC GTGAACCGTC GCCGGAACAA 

451 ATCAAACAAG CAGCATCTTA TACGGGCATA GACAAAATCA TTTTGAAACA 

501 AGGCAAAGAT TACGCTTCCT TGAGCAAAAC CCACCCCAAG GCCTATTTGG 

551 ATTTATCTTC GATTGCCAAA GGCTTCGGCG TTGATNANGT TGCGGGCGAA 

601 CTGGAAAAAT ACGGCATTCA AAATTATCTG GTCGAAATCG GCGGNGAGTT 

651 GCACGGCAAA GNCAAAAACG CGCGCGGCGA ACCTTGGCGC ATCGGCATCG 

701 AACAGCCCAA CATCGTCCAA GGCGGCAATA CGCAGATTAT CGTCCCGCTG 

751 AACAACCGTT CGNTTGCCAC TTCCGGCGAT TACCGTATTT TCCACGTCGA 

801 TAAAAGCGGC AAACGCCTCT CCCATATCAT TAATCCGAAC AACAAACGAC 

851 CCATCAGCCA CAACCTCGCC TCCATCAGCG TGNTCGCAGA CAGTGCGATG 

901 ACGGCGGACG GCTTNTCCAC AGGATTATTC GTATTGGGCG AAACCGAAGC 

951 CTTAAAGCTG GCAGAGCGCG AAAAACTCGC TGTTTTCCTG ATTGTCAGGG 

1001 ATAAAGGCGG CTACCGCACC GCCATGTCTT CCGAATTTGA AAAACTGCTC 

1051 CGCTAA 

This encodes a protein having amino acid sequence <SEQ ID 448>: 



orfllla.pep 
orflll 

orf 111a . pep 
orflll 



1 MPSETRLPNF IRTLIFALSF IFLNA CSEQT AQTVTLQGET MGTTYTVKYL 
51 SNNRDXLPSP AEIQXRIDDA LKEVNRQMST YQPDSEISRF NQHTAGKPLR 
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101 ISSDFAHVTA EAVHLNRLTH GALDVTVGPL VNLWGFGPDK SVTREPSPEQ 

151 IKQAASYTGI DKIILKQGKD YASLSKTHPK AYLDLSSIAK GFGVDXVAGE 

201 LEKYGIQNYL VEIGGELHGK XKNARGEPWR IGIEQPNIVQ GGNTQIIVPL 

251 NNRSXATSGD YRIFHVDKSG KRLSHIINPN NKRPISHNLA SISVXADSAM 

301 TADGXSTGLF VLGETEALKL AEREKLAVFL IVRDKGGYRT AMSSEFEKLL 

351 R* 



Homology with a predicted ORF from N.zonorrhoeae 

ORF1 1 1 shows 96.6% identity over a 351aa overlap with a predicted ORF (ORF1 1 Lng) from N. 
gonorrhoeae: 

10 20 30 40 50 60 

orflllng MPSETRLPNLIRALIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSP 
I I I I I I I I I : I I : I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I 
orflll MPSETRLPNFIRVLI FALGFI FLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPS P 

10 20 30 40 50 60 



70 80 90 100 110 120 

orflll AKIQKRI DDALKEVNRQMSTYQTDSEI SRFNQHTAGKPLRI S SDFAHVTAEAVRLNRLTH 

I: I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
orflll AEIQKR I DDALKEVNRQMSTYQPDSE I SRFNQHTAGKPLRIS SDFAHVTAEAVRLNRLTH 

70 80 90 100 110 120 

130 140 150 160 170 180 

orflllng GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILQQGKDYASLSKTHPK 
I! II I! I II I! II II 1! Mill IIIIM II III II III I I I I I I 1:1 Mill MM III I 
orflll GALDVTVGPLVNLWGFGPDKSVTREPS PEQIKQAAS YTGI DKI I LKQGKDYASLSKTHPK 

130 140 150 160 170 180 

190 200 210 220 230 240 

orflllng AYLDLSSIAKGFGVDKVAGELEKYGIQNYLVEIGGELHGKGKNAHGEPWRIGIEQPNIIQ 
I I I I I I II I I I I I I I I I I I II I I I I I I I I II II I I I I I I I I I I I : I I I I I I I I II I I I : I 
orflll AYLDLSSIAKGFGVDKVAGELEKYGIQNYLVEIGGELHGKGKNARGEPWRIGIEQPNIVQ 

190 200 210 220 230 240 

250 260 270 280 290 300 

orflllng GGNTQIIVPLNNRSLATSGDYRIFHVDKNGKRLSHIINPNNKRPISHNLASISWSDSAM 
I I I I I I I I I I II I I I I I I I I I I I I I I I II I II I II I I I I I I I II I I I I I I I I I II : I I I I 
orflll GGNTQI IVPLNNRSLATSGDYRI FHVDKNGKRLSHI INPNNKRPISHNLAS ISWADSAM 

250 260 270 280 290 300 



310 320 330 340 350 

orflllng TADGLSTGLFVLGETEALRLAEQEKLAVFLIVRDKDGYRTAMSSEFAKLLRX 
I I I I I I I I II I I I I I i I I : I II : I I 1 I II I I I M I I I I I I II I I I I 1 I I I 
orflll TADGLSTGLFVLGETEALKLAEREKLAVFLIVRDKGGYRTAMSSEFEKLLRX 

310 320 330 340 350 

The complete length ORP1 1 lng nucleotide sequence <SEQ ID 449> is: 



1 ATGCCGTCTG AAACACGCCT 

51 CCTGGGTTTC ATCTTCCTGA 

101 TTACCCTGCA AGGCGAAAcg 

151 TCAAATAATC GGGACAAACT 

201 TGATGATGCG CTTAAAGAAG 

251 ATTCCGAAAT CAGCCGGTTC 

301 ATTTCAAGCG ATTTCGCACA 

351 CCTGACTCAC GGCGCACTGG 

401 GGGGGTTCGG CCCCGACAAA 

451 ATCAAACAGG CGGCATCTTA 

501 AGGCAAAGAT TACGCTTCCT 

551 ATTTATCTTC GATTGCCAAA 

601 CTGGAAAAAT ACGGCATTCA 

651 GCACGGCAAA GGCAAAAATG 

701 AGCAACCCAA TATCATCCAA 

751 aaCaaccgtt cgctTGCCAC 

801 TAAAAAcggc aaacgccttt 

851 ccATCAGcca caacctcgcc 

901 ACGGCGGACG GTTtatCCAC 

951 CTTAAGGCTG GCAGAACAAG 



GCCGAACCTT ATCCGCGCCT TGATATTTGC 
ACGCCTGTTC GGaacaaacC GCGCAaaccg 
aTGGGTACGA CCTATACCGT CAAATACCTT 
CCCCTCCCCT GCCAAAATAC AAAAGCGCAT 
TCAACCGGCA GATGTCCACC TACCAGACCG 
AACCAACACA CAGCCGGCAA GCCCCTCCGC 
CGTTACCGCC GAAGCCGTCC GCCTGAACCG 
ACGTAACCGT CGGCCCTTTG GTCAACCTTT 
TCCGTTACCC GTGAACCGTC GCCGGAACAA 
TACGGGCATA GACAAAATCA TTTTGCAACA 
TGAGCAAAAC CCACCCCAAA GCCTATTTGG 
GGCTTCGGCG TTGATAAAGT TGCGGGCGAA 
AAATTATCTG GTCGAAAtcg gcggcGAGTT 
CGCACGGCGA ACCGTGGCGC ATCGGTATAG 
GgcgGCAata CGCAGATTAt cgtcccgctg 
TTCCGGCGAT TAccgtaTTT tccacgtcgA 
cccacaTCAT CAATCCCaAC aacAAACgac 
tccatcagcg tggtctcAGA CAGTGCAATG 
AGGATTATTT GTTTTAGGCG AAACCGAAGC 
AAAAACTCGC TGTTTTCCTA ATTGTCCGGG 
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1001 ATAAGGACGG CTACCGCACC GCCATGTCTT CCGAATTTGC CAAGCTGCTC 
1051 CGCTAA 

This encodes a protein having amino acid sequence <SEQ ID 450>: 

1 MPSETRLPNL IRALIFALGF IFLNA CSEQT AQTVTLQGET MGTTYTVKYL 

51 SNNRDKLPSP AKIQKRIDDA LKEVNRQMST YQTDSEISRF NQHTAGKPLR 

101 ISSDFAHVTA EAVRLNRLTH GALDVTVGPL VNLWGFGPDK SVTREPSPEQ 

151 IKQAASYTGI DKIILQQGKD YASLSKTHPK AYLDLSSIAK GFGVDKVAGE 

201 LEKYGIQNYL VEIGGELHGK GKNAHGEPWR IGIEQPNIIQ GGNTQIIVPL 

251 NNRSLATSGD YRIFHVDKNG KRLSHIINPN NKRPISHNLA SISWSDSAM 

301 TADGLSTGLF VLGETEALRL AEQEKLAVFL IVRDKDGYRT AMSSEFAKLL 

351 R* 

This protein shosw homology with a hypothetical lipoprotein precursor from HJnfluenzae: 

sp|P44550|YOJL_HAEIN HYPOTHETICAL LIPOPROTEIN HI0172 PRECURSOR >gi | 1074292 | pir | 4 
hypothetical protein HI0172 - Haemophilus influenzae (strain Rd KW20) 
>gi | 1573128 (U32702) hypothetical [Haemophilus influenzae] Length - 346 
Score = 353 bits (896), Expect = 9e-97 

Identities - 181/344 (52%), Positives = 247/344 (71%), Gaps = 4/344 (1%) 

Query: 7 LPNLIRALIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSPAKIQKR 66 

+ LI +1 + L AC ++T + ++L G+TMGTTY VKYL + S K + 

Sbjct: 1 MKKLISGIIAVAMALSLAACQKET-KVISLSGKTMGTTYHVKYLDDGSITATSE-KTHEE 58 

Query: 67 IDDALPCEVNRQMSTYQTDSEISRFNQHT-AGKPLRISSDFAHVTAEAVRLNRLTHGALDV 125 

1+ LK+VN +MSTY+ DSE+SRFNQ+T P+ IS+DFA V AEA+RLN++T GALDV 
Sbjct: 59 IEAILKDVNAKMSTYKKDSELSRFNQNTQVNTPIEISADFAKVLAEAIRLNKVTEGALDV 118 

Query: 126 TVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILQQGKDYASLSKTHPKAYLDL 185 

TVGP+VNLWGFGP+K ++P+PEQ+ + ++ GIDKI L K+ A+LSK P+ Y+DL 
Sbjct: 119 TVGPWNLWGFGPEKRPEKQPTPEQLAERQAWVGIDKITLDTNKEKATLSKALPQVYVDL 178 

Query: 186 S S I AKG FGVDKVAGE LEKYG I QN YLVE IGGELHGKGKNAHGE PWRI G I EQPN I IQGGNTQ 245 
SSIAKGFGVD+VA +LE+ QNY+VEIGGE+ KGKN G+PW+I IE+P + 

Sbjct: 179 S S I AKGFGVDQVAEKLEQLNAQN YMVE IGGE IRAKGKN I EGKPWQI AI EKPTTTGERAVE 238 

Query: 246 1 1 VPLNNRSLATSGDYRI FHVDKNGKRLSHI INPNNKRPI SHNLAS ISWS DSAMTADGL 305 

++ LNN +A+SGDYRI+ ++NGKR +H I+P PI H+LASI+V++ ++MTADGL 

Sbjct: 239 AVIGLNNMGMASSGDYRIY-FEENGKRFAHEIDPKTGYPIQHHLASITVLAPTSMTADGL 297 

Query: 306 STGLFVLGETEALRLAEQEKLAVFLIVRDKDGYRTAMSSEFAKL 349 

STGLFVLGE +AL +AE+ LAV+LI+R +G+ T SS F KL 
Sbjct: 298 STGLFVLGEDKALEVAEKNNLAVYLIIRTDNGFVTKSSSAFKKL 341 

Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 54 

The following partial DNA sequence was identified in N .meningitidis <SEQ ID 45 1>: 

1 . . CCGTGCCGCC GACAGGGCGA CGACGTGTAT GCGGCGCACG CGTCCCGTCA 

51 AAAATTGTGG CTGCGCTTCA TCGGCGGCCG GTCGCATCAA AATATACGGG 

101 GCGGCGCGGC TGCGGACGGG TGGCGCAAAG GCGTGCAAAT CGGCGGCGAG 

151 GTGTTTGTAC GGCAAAATGA AGGCAGCCkA yTGGCAATCG GCGTGATGGG 

201 CGGCAGGGCC GGCCAGCACG CwTCAGTCAA CGGCAAAGGC GGTGCGGCAG 

251 gCAGTGATTT GTATGGTTAT GgCGGGGgTG TTTATGCTgC GTGGCATCAG 

301 TTGCGCGATA AACAAACGGG TgCGTATTTG GACGGCTGGT TGCAATACCA 

351 ACGTTTCAAA CACCGCATCA ATGATGAAAA CCGTGCGGAA CgCTACAAAA 

401 CCAAAGGTTG GACGGCTTCT GTCGAAGGCG GCTACAACGC GCTTGTGGCG 

451 GAAGGCATTG TCGGAAAAGG CAATAATGTG CGGTTTTACC TACAACCGCA 

501 GgCGCAGTTT ACCTACTTGG GCGTAAACGG CGGCTTTACC GACAGCGAGG 

551 GGACGGCGGT CGGACTGCTC GGCAGCGGTC AGTGGCAAAG CCGCGCCGGC 

601 AtTCGGGCAA AAACCCGTTT TGCTTTGCGT AACGGTGTCA ATCTTCAGCC 

651 TTTTGCCGCT TTTAATGTtt TGCACAGGTC AAAATCTTTC GGCGTGGAAA 

701 TGGACGGCGA AAAACAGACG CTGGCAGGCA GGACGGCACT CGAAGGGCGG 
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751 TTCGGTATTG AAGCCGGTTG GAAAGGCCAT ATGTCCGCA . . 

This corresponds to the amino acid sequence <SEQ ID 452; ORF35>: 



1 . . PCRRQGDDVY AAHASRQKLW LRFIGGRSHQ NIRGGAAADG WRKGVQIGGE 

51 VFVRQNEGSX LAIGVMGGRA GQHASVNGKG GAAGSDLYGY GGGVYAAWHQ 

101 LRDKQTGAYL DGWLQYQRFK HRINDENRAE RYKTKGWTAS VEGGYNALVA 

151 EGIVGKGNNV RFYLQPQAQF TYLGVNGGFT DSEGTAVGLL GSGQWQSRAG 

201 IRAKTRFALR NGVNLQPFAA FNVLHRSKSF GVEMDGEKQT LAGRTALEGR 

251 FGIEAGWKGH MSA. . 

Computer analysis of this amino acid sequence gave the following results: 



Homology with putative secreted VirG-homolgue of N. meningitidis (accession number A32247) 
ORF and virg-h protein show 51% aa identity in 261aa overlap: 

Orf35 5 QGDDVYAAHASRQKLWLRFIGGRSHQNIRGGAA-ADGWRKGVQIGGEVFVRQNEGSXLAI 
+ D++ R+ LWLR I G S+Q ++G A +G+RKGVQ+GGEVF QNE + L+I 

virg-h 396 KNSDIFDRTLPRKGLWLRVIDGHSNQWVQGKTAPVEGYRKGVQLGGEVFTWQNESNQLSI 



Orf35 64 GVMGGRAGQHASVNGKG — GAAGSDLYGYGGGVYAAWHQLRDKQTGAYLDGWLQYQRFKH 
G+MGG+A Q ++ + ++ G+G GVYA WHQL+DKQTGAY D W+QYQRF+H 

virg-h 456 GLMGGQAEQRSTFHNPDTDNLTTGNVKGFGAGVYATWHQLQDKQTGAYADSWMQYQRFRH 

Orf35 122 RINDENRAERYKTKGWTASVEGGYNALVAEGIVGKGNNVRFYLQPQAQFTYLGVNGGFTD 
RIN E+ ER+ +KG TAS+E GYNAL+AE KGN++R YLQPQAQ TYLGVNG F+D 
virg-h 516 RINTEDGTERFTSKGITASIEAGYNALLAEHFTKKGNSLRVYLQPQAQLTYLGVNGKFSD 

Orf 35 182 SEGTAVGLLGSGQWQSRAGIRAKTRFALRNGVNLQPFAAFNVLHRSKSFGVEMDGEKQTL 
SE V LLGS Q Q+R G++AK +F+L + ++PFAA N L+ +K FGVEMDGE++ + 
virg-h 57 6 SEN AH VN LLG S RQLQT RVG VQAKAQ FS L YKN I AI E P FAAVN AL YHNK P FG VEM DGERRV I 

Orf35 242 AGRTALEGRFGIEAGWKGHMS 262 

+TA+E + G+ K H++ 
virg-h 636 NNKTAIESQLGVAVKIKSHLT 656 

Homology with a predicted ORF from N. meningitidis (strain A) 
ORF35 shows 96,9% identity over a 259aa overlap with an ORF (ORF35a) from strain A of N. 



meningitidis: 



10 20 30 

or f 35 . pep PCRRQGDDVYAAHASRQKLWLRFIGGRSHQNIRG 

: I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 35a QRLAI PEAEAVLYAQQAYAANTLFGLRAADRGDDVYAADPSRQKLWLRFIGGRSHQNIRG 

310 320 330 340 350 360 

40 50 60 70 80 90 

orf 35 . pep GAAADGWRKGVQIGGEVFVRQNEGSXLAIGVMGGRAGQHASVNGKGGAAGSDLYGYGGGV 

I I I 1 II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I hllllll 
orf 35a GA7\ADGRRKGVQIGGEVFVRQNEGSRLAIGVMGGRAGQHASVNGKGGAAGSYLHGYGGGV 
370 380 390 400 410 420 



100 110 120 130 140 150 

orf 35 . pep YAAWHQLRDKQTGAYLDGWLQYQRFKHRINDENRAERYKTKGWTASVEGGYNALVAEGIV 
I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 1 I I I I I I : I 
orf 35a YAAWHQLRDKQTGAYLDGWLQYQRFKHRINDENRAERYKTKGWTASVEGGYNALVAEGW 
430 440 450 460 470 480 



160 170 180 190 200 210 

orf 35 . pep GKGNNVRFYLQPQAQFTYLGVNGGFTDSEGTAVGLLGSGQWQSRAGIRAKTRFALRNGVN 
II II Ml I II I I IIMMI Ml I Mill II MM Mill I I Ml II 111 I! II Mill II 
orf 35a GKGNNVRFYLQPQAQFTYLGVNGGFTDSEGTAVGLLGSGQWQSRAGIRAKTRFALRNGVN 
490 500 510 520 530 540 

220 230 240 250 260 

orf 35 . pep LQPFAAFNVLHRSKSFGVEMDGEKQTLAGRTALEGRFGIEAGWKGHMSA 
II III II I IIMMI III IIIMIMI M I MM II Ml MM I M II I 
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orf35a LQPFAAFNVLHRSKSFGVEMDGEKQTLAGRTALEGRFGIEAGWKGHMSARIGYGKRTDGD 
550 560 570 580 590 600 

orf35a KEAALSLKWLFX 
610 620 

The complete length ORF35a nucleotide sequence <SEQ ID 453> is: 

1 ATGTTCAGAG CTCAGCTTGG TTCAAATACT CGTTCTACCA AAATCGGCGA 

51 CGATGCCGAT TTTTCATTTT CAGACAAGCC GAAACCCGGC ACTTCCCATT 

101 ATTTTTCCAG CGGTAAAACC GATCAAAATT CATCCGAATA TGGGTATGAC 

151 GAAATCAATA TCCAAGGTAA AAACTACAAT AGCGGCATAC TCGCCGTCGA 

201 TAATATGCCC GTTGTTAAGA AATATATTAC AGATACTTAC GGGGATAATT 

251 TAAAGGATGC GGTTAAGAAG CAATTACAGG ATTTATACAA AACAAGACCC 

301 GAAGCTTGGG AAGAAAATAA AAAACGGACT GAGGAGGCGT ATATAGAACA 

351 GCTTGGACCA AAATTTAGTA TACTCAAACA GAAAAACCCC GATTTAATTA 

401 ATAAATTGGT AGAAGATTCC GTACTCACTC CTCATAGTAA TACATCACAG 

451 ACTAGTCTCA ACAACATCTT CAATAAAAAA TTACACGTCA AAATCGAAAA 

501 CAAATCCCAC GTCGCCGGAC AGGTGTTGGA ACTGACCAAG ATGACGCTGA 

551 AAGATTCCCT TTGGGAACCG CGCCGCCATT CCGACATCCA TATGCTGGAA 

601 ACTTCCGATA ATGCCCGCAT CCGCCTGAAC ACGAAAGATG AAAAACTGAC 

651 CGTCCATAAA GCGTATCAGG GCGGTGCGGA TTTCCTGTTC GGCTACGACG 

701 TGCGGGAGTC GGACAAACCC GCCCTGACCT TTGAAGAAAA AGTCAGCGGA 

751 CAATCCGGCG TGGTTTTGGA ACGCCGGCCG GAAAATCTGA AAACGCTCGA 

801 CGGGCGCAAA CTGATTGCGG CGGAAAAGGC AGACTCTAAT TCGTTTGCGT 

851 TTAAACAAAA TTACCGGCAG GGACTGTACG AATTATTGCT CAAGCAATGC 

901 GAAGGCGGAT TTTGCTTGGG CGTGCAGCGT TTGGCTATCC CCGAGGCGGA 

951 AGCGGTTTTA TATGCCCAAC AGGCTTATGC GGCAAATACT TTGTTCGGGC 

1001 TGCGTGCCGC CGACAGGGGC GACGACGTGT ATGCCGCCGA TCCGTCCCGT 

1051 CAAAAATTGT GGCTGCGCTT CATCGGCGGC CGGTCGCATC AAAATATACG 

1101 GGGCGGCGCG GCTGCGGACG GGCGGCGCAA AGGCGTGCAA ATCGGCGGCG 

1151 AGGTGTTTGT ACGGCAAAAT GAAGGCAGCC GGCTGGCAAT CGGCGTGATG 

1201 GGCGGCAGGG CTGGCCAGCA CGCATCAGTC AACGGCAAAG GCGGTGCGGC 

1251 AGGCAGTTAT TTGCATGGTT ATGGCGGGGG TGTTTATGCT GCGTGGCATC 

1301 AGTTGCGCGA TAAACAAACG GGTGCGTATT TGGACGGCTG GTTGCAATAC 

1351 CAACGTTTCA AACACCGCAT CAATGATGAA AACCGTGCGG AACGCTACAA 

1401 AACCAAAGGT TGGACGGCTT CTGTCGAAGG CGGCTACAAC GCGCTTGTGG 

1451 CGGAAGGCGT TGTCGGAAAA GGCAATAATG TGCGGTTTTA CCTGCAACCG 

1501 CAGGCGCAGT TTACCTACTT GGGCGTAAAC GGCGGCTTTA CCGACAGCGA 

1551 GGGGACGGGG GTCGGACTGC TCGGCAGCGG TCAGTGGCAA AGCCGCGCCG 

1601 GCATTCGGGC AAAAACCCGT TTTGCTTTGC GTAACGGTGT CAATCTTCAG 

1651 CCTTTTGCCG CTTTTAATGT TTTGCACAGG TCAAAATCTT TCGGCGTGGA 

1701 AATGGACGGC GAAAAACAGA CGCTGGCAGG CAGGACGGCG CTCGAAGGGC 

1751 GGTTCGGCAT TGAAGCCGGT TGGAAAGGCC ATATGTCCGC ACGCATCGGA 

1801 TACGGCAAAA GGACGGACGG CGACAAAGAA GCCGCATTGT CGCTCAAATG 

1851 GCTGTTTTGA 

This encodes a protein having amino acid sequence <SEQ ID 454>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



MFRAQLGSNT 
EINIQGKNYN 
EAWEENKKRT 
TSLNNIFNKK 
TSDNARIRLN 
QSGWLERRP 
EGGFCLGVQR 
QKLWLRFIGG 
GGRAGQHASV 
QRFKHRINDE 
QAQFTYLGVN 
PFAAFNVLHR 
YGKRTDGDKE 



RSTKIGDDAD 
SGILAVDNMP 
EEAYIEQLGP 
LHVKIENKSH 
TKDEKLTVHK 
ENLKTLDGRK 
LAI PEAEAVL 
RSHQNIRGGA 
NGKGGAAGSY 
NRAERYKTKG 
GGFTDSEGTA 
SKSFGVEMDG 
AALSLKWLF* 



FSFSDKPKPG 
WKKYITDTY 
KFSILKQKNP 
VAGQVLELTK 
AYQGGADFLF 
LIAAEKADSN 
YAQQAYAANT 
AADGRRKGVQ 
LHGYGGGVYA 
WTASVEGGYN 
VGLLGSGQWQ 
EKQTLAGRTA 



TSHYFSSGKT 
GDNLKDAVKK 
DLINKLVEDS 
MTLKDSLWEP 
GYDVRESDKP 
SFAFKQNYRQ 
LFGLRAADRG 
IGGEVFVRQN 
AWHQLRDKQT 
ALVAEGWGK 
SRAGIRAKTR 
LEGRFGIEAG 



DQNSSEYGYD 
QLQDLYKTRP 
VLTPHSNTSQ 
RRHSDIHMLE 
ALTFEEKVSG 
GLYELLLKQC 
DDVYAADPSR 
EGSRLAIGVM 
GAYLDGWLQY 
GNNVRFYLQP 
FALRNGVNLQ 
WKGHMSARIG 



Homology with a predicted ORF from N.sonorrhoeae 

ORF35 shows 51.7% identity over a 261aa overlap with a predicted ORF (ORF35ngh) from N. 



gonorrhoeae: 



orf 35 . pep PCRRQGD DV YAAHAS RQKLWLRFI GGRS HQN I RG 34 

:::|:: I : I 1 I I I I : I : I : : I 

orf35ngh FTKVQERDDI AI YAQQAQAANTLFALRLNDKNS DI FDRTLPRKGLWLRV I DGHSNQWVQG 370 



WO 99/24578 



-273- 



PCT7IB98/01665 



orf 35 . pep GAA-ADGWRKGVQIGGEVFVRQNEGSXUVIGVMGGRAGQHASVNGKG — GAAGSDLYGYG 91 

: I : : I : I I I I I : I I I I I : I I I : : I : I I : I I I : I I : : : : : : : : : I : I 
orf35ngh KTAPVEGYRKGVQLGGEVFTWQNESNQLSIGLMGGQAEQRSTFRNPDTDNLTTGNVKGFG 430 

orf 35 . pep GGVYAAWHQLRDKQTGAYLDGWLQYQRFKHRINDENRAERYKTKGWTASVEGGYNALVAE 151 

: I I I I : I I I I : I I I I I I I : I : I : I I I I I : I I I I I : I I : : I I I I I : I : I I I I I : I I 
orf35ngh AGVYATWHQLQDKQTGAYVDSWMQYQRFRHRINTEYATERFTSKGITASIEAGYNALLAE 4 90 

or f 35 . pep GIVGKGNNVRFYLQPQAQFTYLGVNGGFTDSEGTAVGLLGSGQWQSRAGIRAKTRFALRN 211 

: : ( I I : : I I I I I I I I : I I I I I I I hill:: I : I I I I I I I I : I : : I I : : I I : I 
orf35ngh HFTKKGNSLRVYLQPQAQLTYLGVNGKFSDSENAQVNLLGSRQLQSRVGVQAKAQFAFTN 550 

orf 35 . pep GVNLQPFAAFNVLHRSKSFGVEMDGEKQTLAGRTALEGRFGIEAGWKGHMSA 263 

| | : ■ | | | : | | : : : : | | | | | : | | : : : : : : : | : : | : : | : I | : | : : 
orf35ngh GVTFQPFVAVNSIYQQKPFGVEIDGDRRVINNKTVIETQLGVAAKIKSHLTLQASFNRQT 610 

A partial ORF35ngh nucleotide sequence <SEQ ID 455> is predicted to encode a protein having 
partial amino acid sequence <SEQ ID 456>: 



1 . , KKLRDRNSEY WKEETYHIKS NGRTYPNIPA LFPKHPFDPF ENINNSKKIS 

51 FYDKEYTEDY LVGFARGFGV EKRNGEEEKP LRQYFKDCVN TENSNNDNCK 

101 ISSFGNYGPI LIKSDIFALA SQIKNSHINS EILSVGNYIE WLRPTLNKLT 

151 GWQEHLYAGL DPFHYIEVTD NSHVIGQTID LGALELTNSL WKPRWNSNID 

201 YLITKNAEIR FNTKNESLLV KEDYAGGARF RFAYDLKDKV PEIPVLTFEK 

251 NITGTSDIIF EGKALDNLKH LDGHQIVKVN DTADKDAFRL SSKYRKGIYT 

301 LSLQQRPEGF FTKVQERDDI AIYAQQAQAA NTLFALRLND KNSDIFDRTL 

351 PRKGLWLRVI DGHSNQWVQG KTAPVEGYRK GVQLGGEVFT WQNESNQLSI 

401 GLMGGQAEQR STFRNPDTDN LTTGNVKGFG AGVYATWHQL QDKQTGAYVD 

451 SWMQYQRFRH RINTEYATER FTSKGITASI EAGYNALLAE HFTKKGNSLR 

501 VYLQPQAQLT YLGVNGKFSD SENAQVNLLG SRQLQSRVGV QAKAQFAFTN 

551 GVTFQPFVAV NSIYQQKPFG VEIDGDRRVI NNKTVIETQL GVAAKIKSHL 

601 TLQASFNRQT SKHHHAKQGA LNLQWTF* 

Based on this prediction, these proteins from N.meningitidis and N.gonorrhoeae, and their epitopes, 

could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 55 



The following partial DNA sequence was identified in N.meningitidis <SEQ ID 457>: 

1 ..GCGGAATATG TTCAGTTCTC TATAGATTTG TTCAGTGTGG GTAAATCGGG 

51 GGGCGGTATA CCTAAGGCTA AGCCTGTGTT TGATGCGAAA CCGAGATGGG 

101 AGGTTGATAG GAAGCTTAAT AAATTGACAA CTCGTGAGCA GGTGGAGAAA 

151 AATGTTCAGG AAACGAGAAG AAGGAGTCAG AGTAGTCAGT TTAAAGCCCA 

201 TGCGCAACGA GAATGGGAAA ATAAAACAGG GTTAGATTTT AATCATTTTA 

251 TAGGTGGTGA TATCAATAAA AAAGGCACAG TAACAGGAGG GCATAGTCTA 

301 ACCCGTGGTG ATGTACGGGT GATACAACAA ACCTCGGCAC CTGATAAACA 

351 TGGGGT.TTA TCAAGCGACA GTGGAAATTN A 

This corresponds to the amino acid sequence <SEQ ID 458; ORF46>: 



1 . . AEYVQFSIDL FSVGKSGGGI PKAKPVFDAK PRWEVDRKLN KLTTREQVEK 
51 NVQETRRRSQ SSQFKAHAQR EWENKTGLDF NHFIGGDINK KGTVTGGHSL 
101 TRGDVRVIQQ TSAPDKHGXL SSDSGNX 

Further work revealed further partial nucleotide sequence <SEQ ID 459>: 



1 . . GCAGTGTGCC TnCCGATGCA 

51 TTTTATCCGG CAGGTTCTCG 

101 ACCACCTATT CGGCAGCAGG 

151 GGATTGGGAA AAATACAAAG 

201 GGCGGCCATT AAAGGAAATA 

251 GGCACGAAGT CCATTCCCCs 

301 GATGAAGCCG GTAGTCCCGT 

351 GGACGGATAC GAACACCATC 

401 GCGGCTATCC CGCTCCCAAA 



TGCACACGCC TCAnATTTGG CAAACGATTC 
ACCGTCAGCA TTTCGAACCC GACGGGAAAT 
GGGGAACTTG CCGAGCGCCA GTCTCATATC 
CCATCAGTTG GGCAACCTGA TGATTCAACA 
TCGGCTACAT TGTCCGCTTT TCCGATCACG 
TTCGACAACC ATGCCTCACA TTCCGATTCT 
TGACGGATTT AGCCTTTACC GCATCCATTG 
CCGCCGACGG CTATGACGGG CCACAGGGCG 
GGCGCGAGGG ATATATACAG TTACGACATA 
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451 AAAGGCGTTG CCCAAAATAT CCGCCTCAAC CTGACCGACA ACCGCAGCAC 

501 CGGACAACGG CTTGCCGACC GTTTCCACAA TGCCGGTAGT ATGCTGACGC 

551 AAGGAGTAGG CGACGGATTC AAACGCGCCA CCCGATACAG CCCCGAGCTG 

601 GACAGATCGG GCAATGCCGC CGAAGCCTTC AACGGCACTG CAGATATCGT 

651 TAAAAACATC ATCGGCGCTG CAGGAGAAAT TGT 

This corresponds to the amino acid sequence <SEQ ID 460; ORF46-l>: 

1 . • AVCLPMHAHA SXLANDSFIR QVLDRQHFEP DGKYHLFGSR GELAERQSHI 

51 GLGKIQSHQL GNLMIQQAAI KGNIGYIVRF SDHGHEVHSP FDNHASHSDS 

101 DEAGSPVDGF SLYRIHWDGY EHHPADGYDG PQGGGYPAPK GARDIYSYDI 

151 KGVAQNIRLN LTDNRSTGQR LADRFHNAGS MLTQGVGDGF KRATRYSPEL 

201 DRSGNAAEAF NGTADIVKNI IGAAGEI 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. gonorrhoeae 

ORF46 shows 98.2% identity over a 11 laa overlap with a predicted ORF (ORF46ng) from N. 
gonorrhoeae: 

orf 4 6 . pep AEYVQFSIDLFSVGKSGGGIPKAKPVFDAKPRWEVDRKLNKLTTR 4 5 

I I I I I I I I I I ! I I I I I ! I I I I I I I I I I I I I 
orf4 6ng PKTGVPFDGKGFPNFEKHVKYDTKLDIQELSGGGIPKAKPVFDAKPRWEVDRKLNKLTTR 217 

orf 4 6 . pep EQVEKNVQETRRRSQSSQFKAHAQREWENKTGLDFNHFIGGDINKKGTVTGGHSLTRGDV 105 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I : I I I I I I I I I I I I 
orf4 6ng EQVEKNVQETRRRSQSSQFKAHAQREWENKTGLDFNHFIGGDINKKGAVTGGHSLTRGDV 277 

orf 4 6. pep RVIQQTSAPDKHGXLSSDSGN 126 

I I I I I I I I I I I I I I I I I I I I 
orf4 6ng RVIQQTSAPDKHGVLSSDSGN 298 

A partial ORF46ng nucleotide sequence <SEQ ID 461 > is predicted to encode a protein having 
partial amino acid sequence <SEQ ID 462>: 



1 

51 
101 
151 
201 
251 



. RRLKHCCHAR 
RTRHRSRQQY 
EIRRQRQXCR 
KLADQRHPKT 
AKPRWEVDRK 
DFNHFIGGDI 



LGSAFHRKQD 
LYGSHPHQRD 
CRLGKIPSLS 
GVPFDGKGFP 
LNKLTTREQV 
NKKGAVTGGH 



GAHQRFGRYG 
WSCPGKIQLG 
IPKYPLKLEQ 
NFEKHVKYDT 
EKNVQETRRR 
SLTRGDVRVI 



ATQRLCRSSH 
RHHGTSCRAV 
RYGKENITSS 
KLDIQELSGG 
SQSSQFKAHA 
QQTSAPDKHG 



PRLGSPKPQC 
ADXRDRICER 
TVPPSNGKNV 
GIPKAKPVFD 
QREWENKTGL 
VLSSDSGN* 



Further work revealed the complete gonococcal DNA sequence <SEQ ID 463>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 



TTGGGCATTT 
CCTGCCGATG 
GgCaggttcT 
TTcggCaGCA 
aaacaTAcaa 
ttgaaggaaA 
ttccattcgc 
CGGTAGTCCC 
ACGAACACCA 
CCCGCTCCCA 
TGCCCAAAAT 
GGCTTGCCGA 
GGCGACGGAT 
GGGCAATGCc 
TCATCGGCGC 
ATAAGCGAAG 
CACCGAAAAC 
TCAAAGACTA 
AATGCCGCAC 
CCCCATCAAA 
TCACGGCACA 
AAAGGGAAAT 
ATACCCGTCC 



CCCGCAAAAT 
CATGCACACG 
CGaccGTCAG 
GGGGGGAGCT 
Agccatcagt 
TAtcgGctac 
ccttcGAcaa 
GTTGACGGAT 
TCCCGCCGAC 
AAGGCGCGAG 
ATCCGCCTCA 
CCGTTTCCAC 
TCAAACGCGC 
gccGAAGCCT 
GGCAGGAGAA 
GCTCAAACAT 
AAGATGGCGC 
TGCCGCAGCA 
AAGGCATAGA 
GGGATTGGAG 
TCCTGTCAAG 
CCGCCGTCAG 
CCTTACCATT 



ATCCCTTATT 
CCTCAGATTT 
CATTTCGaac 
TgccnagcGC 
tGggccacct 
attgtccgct 
ccaTGCCTCA 
TCAGCCTTTA 
GGCTATGACG 
GGATATATAC 
ACCTGACCGA 
AATGCCGGCG 
CACCCGATAC 
TCAACGGCAC 
ATTGTCGGCG 
TGCTGTCATG 
GCATCAACGA 
GCCATCCGCG 
AGCCGTCAGC 
CTGTCCGGGG 
CGGTCGCAGA 
CGACAATTTT 
CCCGAAATAT 



CTGTCCATAC 
GGcaAACGAT 
ccgacggGAa 
aacggccATa 
gatgattcaa 
tttccgatca 
CATTCCGATT 
CCGCATCCAT 
GGCCACAGGG 
AGCTACGACA 
CAACCGCAGC 
CTATGCTGAC 
AGCCCCGAGC 
TGCAGATATC 
CAGGCGATGC 
CACGGCTTGG 
TTTGGCAGAT 
ATTGGGCAGT 
AATATCTTTA 
AAAATACGGC 
TGGGCGCGAT 
GCCGATGCGG 
CCGTTCAAAC 



TGGCAGTGTG 
CCCTTTATCC 
ATACCaCCTA 
tcggattggG 
caggcggccg 
cgggcacaaa 
CTGACGAAGC 
TGGGACGGAT 
CGGCGGCTAT 
TAAAAGGCGT 
ACCGGACAAC 
GCAAGGAGTA 
TGGACAGATC 
GTCAAAAACA 
CGTGCagGGT 
GTCTGCTTTC 
ATGGCGCAAC 
CCAAAACCCC 
TGGCAGCCAT 
TTGGGCGGCA 
CGCATTGCCG 
CATACGCCAA 
TTGGAGCAGC 
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1151 GTTACGGCAA AGAAAACATC ACCTCCTCAA CCGTGCCGCC GTCAAACGGC 

1201 AAAAATGTCA AACTGGCAGA CCAACGCCAC CCGAAGACAG GCGTACCGTT 

1251 TGACGGTAAA GGGTTTCCGA ATTTTGAGAA GCACGTGAAA TATGATACGA 

1301 AGCTCGATAT TCAAGAATTA TCGGGGGGCG GTATACCTAA GGCTAAGCCT 

1351 GTGTTTGATG CGAAACCGAG ATGGGAGGTT GATAGGAAGC TTAATAAATT 

1401 GACAACTCGT GAGCAGGTGG AGAAAAATGT TCAGGAAACG AGAAGAAGGA 

1451 GTCAGAGTAG TCAGTTTAAA GCCCATGCGC AACGAGAATG GGAAAATAAA 

1501 ACAGGGTTAG ATTTTAATCA TTTTATAGGT GGTGATATCA ATAAGAAAGG 

1551 CACAGTAACA GGAGGGCATA GTCTAACCCG TGGTGATGTA CGGGTGATAC 

1601 AACAAACCTC GGCACCTGAT AAACATGGGG TTTATCAAGC GACAGTGGAA 

1651 ATTAAAAAGC CTGATGGAAG TTGGGAGGTG AAAACGAAAA AAGGTGGGAA 

1701 AGTGATGACC AAGCACACCA TGTTCCCAAA AGATTGGGAT GAGGCTAGAA 

1751 TTAGGGCTGA AGTTACTTCG GCTTGGGAAA GTAGAATAAT GCTTAAGGAT 

1801 AATAAATGGC AGGGTACAAG TAAATCGGGT ATTAAAATAG AAGGATTTAC 

1851 CGAACCTAAT AGAACAGCAT ATCCCATTTA TGAATAG 

This corresponds to the amino acid sequence <SEQ ID 464; ORF46ng-l>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



LGISRKISLI LSILAVCLPM HAHASDLAND 



FGSRGELAXR 
FHSPFDNHAS 
PAPKGARDIY 
GDGFKRATRY 
ISEGSNIAVM 
NAAQGIEAVS 
KGKSAVSDNF 
KNVKLADQRH 
VFDAKPRWEV 
TGLDFNHFIG 
IKKPDGSWEV 
NKWQGTSKSG 



NGHIGLGNIQ 
HSDSDEAGSP 
SYDIKGVAQN 
SPELDRSGNA 
HGLGLLSTEN 
NIFMAAIPIK 
ADAAYAKYPS 
PKTGVPFDGK 
DRKLNKLTTR 
GDINKKGTVT 
KTKKGGKVMT 
IKIEGFTEPN 



SHQLGHLMIQ 
VDGFSLYRIH 
IRLNLTDNRS 
AEAFNGTADI 
KMARINDLAD 
GIGAVRGKYG 
PYHSRNIRSN 
GFPNFEKHVK 
EQVEKNVQET 
GGHSLTRGDV 
KHTMFPKDWD 
RTAYPIYE* 



PFIRQVLDRQ 
QAAVEGNIGY 
WDGYEHHPAD 
TGQRLADRFH 
VKNIIGAAGE 
MAQLKDYAAA 
LGGITAHPVK 
LEQRYGKENI 
YDTKLDIQEL 
RRRSQSSQFK 
RVIQQTSAPD 
EARIRAEVTS 



HFEPDGKYHL 
IVRFSDHGHK 
GYDGPQGGGY 
NAGAMLTQGV 
IVGAGDAVQG 
AIRDWAVQNP 
RSQMGAIALP 
TSSTVPPSNG 
SGGGIPKAKP 
AHAQREWENK 
KHGVYQATVE 
AWESRIMLKD 



ORF46ng-l and ORF46-1 show 94.7% identity in 227 aa overlap: 



10 20 30 40 

orf 4 6-1 . pep AVCLPMHAHASXLANDSFIRQVLDRQHFEPDGKYHLFGSRGELAER 

I I I I I I 1 I I I I I I I I t II I I I I I I I I I I II I I I I i I I I I I I I I 
orf4 6ng-l LGISRKISLILSILAVCLPMHAHASDLANDPFIRQVLDRQHFEPDGKYHLFGSRGELAXR 

10 20 30 40 50 60 



50 60 70 80 90 100 

orf 4 6-1 . pep QSHIGLGKIQSHQLGNLMIQQAAIKGNIGYIVRFSDHGHEVHSPFDNHASHSDSDEAGSP 
:: I I I I I : I I I I I I I : I I I I I I I :: I I I I I I i I I I I M I : I I II I I I II I I I I I I I I I I 
orf46ng-l NGHIGLGNIQSHQLGHLMIQQAAVEGNIGYIVRFSDHGHKFHSPFDNHASHSDSDEAGSP 

70 80 90 100 110 120 



110 120 130 140 150 160 

or f 4 6-1 . pep VDG FS LYRI HWDG YEHHPADG YDGPQGGG Y PAPKGARD I YS Y D I KGVAQN I RLNLT DNRS 
M I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf4 6ng-l VDG FSLYRIHWDG YEHHPADG YDGPQGGG YPAPKGARDIYSYDIKGVAQNIRLNLTDNRS 

130 140 150 160 170 180 



170 180 190 200 210 220 

orf 4 6-1 . pep TGQRLADRFHNAGSMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTADIVKNIIGAAGE 
I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf4 6ng-l TGQRLADRFHNAGAMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTADIVKNIIGAAGE 

190 200 210 220 230 240 



orf 46-1. pep I 
I 

orf4 6ng-l IVGAGDAVQGISEGSNIAVMHGLGLLSTENBCMARINDLADMAQLKDYAAAAIRDWAVQNP 

250 260 270 280 290 300 

Homology with a predicted ORF from N. meningitidis ( strain A) 

ORF46ng-l shows 87.4% identity over a 486aa overlap with an ORF (ORF46a) from strain A of 
N. meningitidis: 

10 20 30 40 50 60 
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orf 4 6a. pep 
orf 4 6ng-l 

orf 46a. pep 
orf 4 6ng-l 

orf 4 6a. pep 
orf 4 6ng-l 

orf 46a. pep 
orf46ng-l 

orf 4 6a. pep 
orf 46ng-l 

orf 4 6a. pep 
orf 4 6ng-l 

orf 4 6a. pep 
orf 4 6ng-l 

orf 4 6a. pep 
orf 46ng-l 

orf 4 6a. pep 
orf4 6ng-l 



LGISRKISLILSILAVCLPMHAHASDLANDSFIRQVLDRQHFEPDGKYHLFGSRGELAER 

I I t I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 
LGI SRKI SLILS I LAVCLPMHAHAS DLANDPFIRQVLDRQHFEPDGKYHLFGSRGELAXR 
10 20 30 40 50 60 

70 80 90 100 110 120 

SGHIGLGNIQSHQLGNLFIQQAAIKGNIGYIVRFSDHGHEVHSPFDNHASHSDSDEAGSP 
: I I I I I I I I i i i E I I : I : I I I I I :: I I I I I I I I I I I I i I : I II I I I I I I I I I I I I I I I I 
NGHIGLGNIQSHQLGHLMIQQAAVEGNIGYIVRFSDHGHKFHSPFDNHASHSDSDEAGSP 

70 80 90 100 110 120 

130 140 150 160 170 180 

VDGFSLYRIHWDGYEHHPADGYDGPQGGGYPAPKG7VRDIYSYDIKGVAQNIRLNLTDNRS 
I I I I i I II I I I I I 1 I I I I I I I I I I I ! I I I M I I I I I I I I I I I I I i 1 I I I I I I I I I I I I I I 
VDGFSLYRIHWDGYEHHPADGYDGPQGGGYPAPKGARDIYSYDIKGVAQNIRLNLTDNRS 

130 140 150 160 170 180 

190 200 210 220 230 240 

TGQRLVDRFHNTGSMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTADIVKNIIGAAGE 
I I I I I : I I I I I : I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
TGQRLADRFHNAGAMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTADIVKNIIGAAGE 

190 200 210 220 230 240 

250 260 270 280 290 300 

IVGAGDAVQGISEGSNIAVMHGLGLLSTENKMARINDLADMAQLKDYAAAAIRDWAVQNP 
I I I I I I ! I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I 1 I I I I I I I I I I I I I I I 
IVGAGDAVQGISEGSNIAVMHGLGLLSTENKMARINDLADMAQLKDYAAAAIRDWAVQNP 

250 260 270 280 290 300 

310 320 330 340 350 360 

N AAQG I E AVSN I FT A VI PVKG I GAVRGKYG LGG I T AH P VKRS QMGE I AL PKGKS AVS DN F 
I I II I I I I I I I I I I : I 1 : I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 
NAAQGIEAVSNIFMAAIPIKGIGAVRGKYGLGGITAHPVKRSQMGAIALPKGKSAVSDNF 

310 320 330 340 350 360 

370 380 390 400 410 420 

ADAAYAKYPSPYHSRNIRSNLEQRYGKENITSSTVPPSNGKNVKLANKRHPKTKVPFDGK 
I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I : : I I I I I 111(11 
ADAAYAKYPSPYHSRNIRSNLEQRYGKENITSSTVPPSNGKNVKLADQRHPKTGVPFDGK 

370 380 390 400 410 420 

430 440 450 460 470 

GFPNFEKDVKYDTRINTAVPQVN PIDEPVFN — PKGSVGSAHSWSITARIQYAKLP 

I I I I I I I I I I I I : : : : : : : I : M I : I : I : : : I : I I I 
GFPNFEKHVKYDTKLD — IQELSGGGIPKAKPVFDAKPRWEVDRKLN-KLTTREQVEKNV 

430 440 450 460 470 

480 490 500 510 520 530 

RQGRIRYIPPKNYSPSAPLPKGPNNGYLDKFGNEWTKGPSRTKGQEFEWDVQLSKTGREQ 
:: I I 

QETRRRSQSSQFKAHAQREWENKTGLDFNHFIGGDINKKGTVTGGHSLTRGDVRVIQQTS 



480 



490 



500 



510 



520 



530 



The complete length ORF46a DNA sequence <SEQ ED 465> is: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 



TTGGGCATTT 
CCTGCCGATG 
GGCAGGTTCT 
TTCGGCAGCA 
AAACATACAA 
TTAAAGGAAA 
GTCCATTCCC 
CGGTAGTCCC 
ACGAACACCA 
CCCGCTCCCA 
TGCCCAAAAT 
GGCTTGTCGA 
GGCGACGGAT 
GGGCAATGCC 
TCATCGGCGC 
ATAAGCGAAG 
CACCGAAAAC 



CCCGCAAAAT 
CATGCACACG 
CGACCGTCAG 
GGGGGGAACT 
AGCCATCAGT 
TATCGGCTAC 
CCTTCGACAA 
GTTGACGGAT 
TCCCGCCGAC 
AAGGCGCGAG 
ATCCGCCTCA 
CCGTTTCCAC 
TCAAACGCGC 
GCCGAAGCTT 
GGCAGGAGAA 
GCTCAAACAT 
AAGATGGCGC 



ATCCCTTATT 
CCTCAGATTT 
CATTTCGAAC 
TGCCGAGCGC 
TGGGCAACCT 
ATTGTCCGCT 
CCATGCCTCA 
TCAGCCTTTA 
GGCTATGACG 
GGATATATAC 
ACCTGACCGA 
AATACCGGTA 
CACCCGATAC 
TCAACGGCAC 
ATTGTCGGCG 
TGCTGTTATG 
GCATCAACGA 



CTGTCCATAC 
GGCAAACGAT 
CCGACGGGAA 
AGCGGTCATA 
GTTCATCCAG 
TTTCCGATCA 
CATTCCGATT 
CCGCATCCAT 
GGCCACAGGG 
AGCTACGACA 
CAACCGCAGC 
GTATGCTGAC 
AGCCCCGAGC 
TGCAGATATC 
CAGGCGATGC 
CACGGCTTGG 
TTTGGCAGAT 



TGGCAGTGTG 
TCTTTTATCC 
ATACCACCTA 
TCGGATTGGG 
CAGGCGGCCA 
CGGGCACGAA 
CTGATGAAGC 
TGGGACGGAT 
CGGCGGCTAT 
TAAAAGGCGT 
ACCGGACAAC 
GCAAGGAGTA 
TGGACAGATC 
GTCAAAAACA 
CGTGCAGGGT 
GTCTGCTTTC 
ATGGCGCAAC 
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851 TCAAAGACTA TGCCGCAGCA GCCATCCGCG ATTGGGCAGT CCAAAACCCC 

901 AATGCCGCAC AAGGCATAGA AGCCGTCAGC AATATCTTTA CGGCAGTCAT 

951 CCCCGTCAAA GGGATTGGAG CTGTTCGGGG AAAATACGGC TTGGGCGGCA 

1001 TCACGGCACA TCCTGTCAAG CGGTCGCAGA TGGGCGAGAT CGCATTGCCG 

1051 AAAGGGAAAT CCGCCGTCAG CGACAATTTT GCCGATGCGG CATACGCCAA 

1101 ATACCCGTCC CCTTACCATT CCCGAAATAT CCGTTCAAAC TTGGAGCAGC 

1151 GTTACGGCAA AGAAAACATC ACCTCCTCAA CCGTGCCGCC GTCAAACGGA 

1201 AAGAATGTGA AACTGGCAAA CAAACGCCAC CCGAAGACCA AAGTGCCGTT 

1251 TGACGGTAAA GGGTTTCCGA ATTTTGAAAA AGACGTAAAA TACGATACGA 

1301 GAATTAATAC CGCTGTACCA CAAGTGAATC CTATAGATGA ACCCGTCTTT 

1351 AATCCTAAAG GTTCTGTCGG ATCGGCTCAT TCTTGGTCTA TAACTGCCAG 

1401 AATTCAATAC GCAAAATTAC CAAGGCAAGG TAGAATCAGA TATATCCCAC 

14 51 CTAAAAATTA CTCTCCTTCA GCACCGCTAC CAAAAGGACC TAATAATGGA 

1501 TATTTGGATA AATTTGGTAA TGAATGGACT AAAGGTCCAT CAAGAACTAA 

1551 AGGTCAAGAA TTTGAATGGG ATGTTCAATT GTCTAAAACA GGAAGAGAGC 

1601 AACTTGGATG GGCTAGTAGG GATGGTAAGC ATTTAAATAT ATCAATTGAT 

1651 GGAAAGATTA CACACAAATG A 

This corresponds to the amino acid sequence <SEQ ID 466>: 



1 LGISRKISLI LSILAVCLPM HAHA SDLAND SFIRQVLDRQ HFEPDGKYHL 
51 FGSRGELAER SGHIGLGNIQ SHQLGNLFIQ QAAIKGNIGY IVRFSDHGHE 
101 VHSPFDNHAS HSDSDEAGSP VDGFSLYRIH WDGYEHHPAD GYDGPQGGGY 
151 PAPKGARDIY SYDIKGVAQN IRLNLTDNRS TGQRLVDRFH NTGSMLTQGV 
201 GDGFKRATRY SPELDRSGNA AEAFNGTADI VKNIIGAAGE IVGAGDAVQG 
251 ISEGSNIAVM HGLGLLSTEN KMARINDLAD MAQLKDYAAA AIRDWAVQNP 
301 NAAQGIEAVS NIFTAVIPVK GIGAVRGKYG LGGITAHPVK RSQMGEIALP 
351 KGKSAVSDNF ADAAYAKYPS PYHSRNIRSN LEQRYGKENI TSSTVPPSNG 
401 KNVKLANKRH PKTKVPFDGK GFPNFEKDVK YDTRINTAVP QVNPIDEPVF 
451 NPKGSVGSAH SWSITARIQY AKLPRQGRIR YIPPKNYSPS APLPKGPNNG 
501 YLDKFGNEWT KGPSRTKGQE FEWDVQLSKT GREQLGWASR DGKHLNISID 
551 GKITHK* 

Based on this analysis, including the presence of a RGD sequence in the gonococcal protein, typical 
of adhesins, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 56 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 467>; 

1 ATGAATATTC ACACCCTGCT CTCCAAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTTGCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

151 TTGGACTATC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTTTCGT 

201 CAAAATTGCC GGCGTATTGG CGTTTTGGCT GGCGGTTTTG TTTGACGGGC 

251 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGATCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCCT GACCGCCCCC GCCCCTTATC AGATAATGAC 

351 CGGGCTG... 

This corresponds to the amino acid sequence <SEQ ID 468; ORF48>: 



1 MNIHTLLSKQ WTLPPFLPKR LLLSLLILLA PNAVFWVLAL LTATARPIVN 
51 LDYLPAALLI ALPWRFVKIA GVLAFWLAVL FDGLMMVIQL FPFMDLIGAI 
101 NLVPFILTAP APYQIMTGL. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 469>: 



1 ATGAATATTC ACACCCTGCT CTCCAAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTTGCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

151 TTGGACTATC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTTTCGT 

201 CAAAATTGCC GGCGTATTGG CGTTTTGGCT GGCGGTTTTG TTTGACGGGC 

251 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGATCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCCT GACCGCCCCC GCCCCTTATC AGATAATGAC 

351 CGGGCTGTTG CTGCTGTATA TGCTGGCGAT GCCGTTTGTG TTGCAGAAAG 
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401 CCGCCGCCAA AACCGACTTC CGGCACATTG CCGTCTGCGC CGCCGTTGTG 

451 GCGGCAGCCG GCTATTTCAC CGGCCATTTG AGTTACTACG ACCGGGGTCG 

501 GATGGCCAAT ATCTTCGGCG CAAACAACTT CTACTACGCC AAAAGTCAGG 

551 CGATGCTCTA CACCGTCAGC CAGAATGCCG ACTTTATTAC CGCCGGCCTG 

601 GTCGATCCCG TCTTCCTCCC CTTGGGCAAT CAACAGCGTG CCGCCACGCA 

651 TCTGAACGAG CCGAAATCTC AAAAAATCCT CTTTATCGTC GCCGAATCTT 

701 GGGGGCTGCC GGCCAATCCC GAACTTCAAA ACGCCACTTT TGCCAAACTG 

751 CTGGCGCAAA AAGACCGTTT TTCGGTTTGG GAAAGCGGCA GTTTTCCCTT 

801 CATCGGCGCG ACGGTCGAAG GCGAAATGCG CGAACTGTGT GCCTACGGCG 

851 GTTTGCGCGG GTTCGCACTG CGCCGCGCGC CCGACGAAAA ATTTGCCCGC 

901 TGCCTCCCCA ACCGTTTGAA ACAAGAAGGT TACGCCACCT TTGCGATGCA 

951 CGGCGCGGGC AGTTCGCTTT ACGACCGCTT CAGCTGGTAT CCGAGGGCGG 

1001 GCTTTCAAGA AATCAAAACC GCCGAAAACC TGATCGGTAA AAAAACCTGC 

1051 GCCATTTTCG GCGGCGTGTG CGACAGCGAG CTGTTCGGCG AAGTGTCGGC 

1101 ATTTTTCAAA AAACACGACA AGGGACTGTT TTACTGGATG ACGCTGACCA 

1151 GCCACGCCGA CTATCCCGAA TCCGACATTT TCAACCACAG GCTCAAATGC 

1201 ACCGAATATG GCCTGCCCGC CGAAACCGAC CTCTGCCGCA ATTTCAGCCT 

1251 GCACACCCAA TTCTTCGACC AACTGGCGGA TTTGATCCAA CGCCCCGAAA 

1301 TGAAAGGCAC GGAAGTCATC ATCGTCGGCG ACCATCCGCC GCCCGTCGGC 

1351 AACCTCAATG AAACCTTCCG CTACCTCAAA CAGGGGCACG TCGCCTGGCT 

1401 GAACTTCAAA ATCAAATAA 

This corresponds to the amino acid sequence <SEQ ID 470; ORF48-l>: 



1 MNIHTLLSKQ WTLPPFLPKR LLLSLLILLA PNAVFWVLAL LTATA RPIVN 

51 LDYLPAALLI ALPWRFVKIA G VLAFWLAVL FDGLMMVI Q L FPFMDLIGAI 

101 NLVPFILTAP APY QIMTGLL LLYMLAMPFV L QKAAAKTDF RHIAVCAAVV 

151 AAAGYFTG HL SYYDRGRMAN IFGANNFYYA KSQAMLYTVS QNADFITAGL 

201 VDPVFLPLGN QQRAATHLNE PKSQKILFIV AESWGLPANP ELQNATFAKL 

251 LAQKDRFSVW ESGSFPFIGA TVEGEMRELC AYGGLRGFAL RRAPDEKFAR 

301 CLPNRLKQEG YATFAMHGAG SSLYDRFSWY PRAGFQEIKT AENLIGKKTC 

351 AIFGGVCDSE LFGEVSAFFK KHDKGLFYWM TLTSHADYPE SDIFNHRLKC 

401 TEYGLPAETD LCRNFSLHTQ FFDQLADLIQ RPEMKGTEVI IVGDHPPPVG 

451 NLNETFRYLK QGHVAWLNFK IK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF48 shows 94.1% identity over a 1 19aa overlap with an ORF (ORF48a) from strain A of N. 



meningitidis: 

10 20 30 40 50 60 

orf 48 .pep MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATA RPIVNLDYLPAALLI 
I I I I I I I I I I I I I I I i I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I 
orf 48a MNIHTLLSKQWTLPPFLPKRLLLSLLILLXPNAVFWVLALLTATA RPIVNLXYLPAALLI 

10 20 30 40 50 60 



70 80 90 100 110 119 

or f 4 8 . pep ALPWRFVKIAG VLAFWLAVLFDGLMMVI Q LFPFMDLIGAINLVPFI LTAPAPY QIMTGL 
I I II I Mi MM II I II II I M I I I I II M II I I I II I II M I II I I II I I I I I 
orf 4 8a ALPWRXVKIXG VLAXWLAVLFDGLMMVI Q LFPFMDLIGAINLVPFI XTAPALYQ IMTGLL 

70 80 90 100 110 120 



orf 48a LLYMLAMPFVLQKAAAKTDFRHIAACAAVWAAGYFTGHLSXYDRGRMAN IFGANNFYYA 

130 140 150 160 170 180 

The complete length ORF48a nucleotide sequence <SEQ ED 471 > is: 

1 ATGAATATTC ACACCCTGCT CTCCAAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTNNCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

151 TTGGANTACC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTNTCGT 

201 CAAAATTGNC GGCGTATTGG CGTNTTGGCT GGCGGTTTTG TTTGACGGGC 

251 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGATCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCNT GACCGCCCCC GCCCTTTATC AGATAATGAC 

351 CGGGCTGTTA CTGCTGTATA TGCTGGCGAT GCCGTTTGTG TTGCAGAAAG 

401 CCGCCGCCAA AACCGACTTC CGACACATTG CCGCCTGTGC CGCCGTTGTG 

451 GTGGCAGCCG GCTATTTTAC CGGCCATTTG AGTTANTACG ACCGGGGGCG 
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501 GATGGCCAAT ATCTTCGGCG CAAACAACTT CTATTACGCC AAAAGTCAGG 

551 CGATGCTCTA CACCGTCAGC CAGAATGCCG ACTTTATTAC CGCCGGCCTG 

601 GTCGATCCCG TCTTCCTCCC CTTGGGCAAT CAACAGCGTG CCGCCACGCA 

651 TCTGAACGAG CCGAAATCTC AAAAAATCCT CTTTATCGTC GCCGAATCTT 

701 GGGGGCTGCC GGCCAATCCC GAACTTCAAA ACGCCACTTT TGCCAAACTG 

751 CTGGCGCAAA AAGANCGTTT TTCGGTTTGG GAAAGCGGCA GTTTTCCCTT 

801 CATCGGCGCG ACGATCGAAG GCGAAATGCG CGAACTGTGT GCCTACGGCG 

851 GTTTGCGCGG GTTCGCACTG CGCCGCGCGC CCGACGAAAA ATTTGCCCGC 

901 TGCCTCCCCA ACCGTTTGAA ACAAGAAGGT TACGCCACCT TTGCGATGCA 

951 CGGCGCGGGC AGTTCGCTTT ACGACCGCTT CAGCTGGTAT CCGAGGGCGG 

1001 GCTTTCAAGA AATCAAAACC GCCGAAAACC TGATCGGTAA AAAAACCTGC 

1051 GCCATTTTCG GCGGCGTGTG CGACAGCGAG CTGTTCGGCG AAGTGTCGGC 

1101 ANTTTTCAAA AAACACGACA AGGGACTGTT TTACTGGATG ACGCTGACCA 

1151 GCCACGCCGA CTATCCCGAA TCNGACATTT TCAACCACAG GCTCAAATGC 

1201 ACCGAATATG GCCTGCCCGC CGAAACCGAC NTCTGCCGCA ATTTCAGCCT 

1251 GCACACCCAA TTCTTCGACC AACTGGCGGA TTTGATCCAA CGCCCCGAAA 

1301 TGAAAGGCAC GGAAGTCATC ATCGTCGGCG ACCATCCGCC GCCCGTCGGC 

1351 AACCTCAATG AAACCTTCCG CTACCTCAAA CAGGGGCACG TCGNCTGGCT 

1401 GAACTTCAAA ATCAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 472>: 

1 MNIHTLLSKQ WTLPPFLPKR LLLSLLILLX PNAVFWVLAL LTATA RPIVN 

51 LXYLPAALLI ALPWRXVKIX G VLAXWLAVL FDGLMMVI Q L FPFMDLIGAI 

101 NLVPFIXTAP ALYQ IMTGLL LLYMLAMPFV L QKAAAKTDF R HIAACAAW 

151 VAAGYFTGHL SXYDRGRMAN IFGANNFYYA KSQAMLYTVS QNADFITAGL 

201 VDPVFLPLGN QQRAATHLNE PKSQKILFIV AESWGLPANP ELQNATFAKL 

251 LAQKXRFSVW ESGSFPFIGA TIEGEMRELC AYGGLRGFAL RRAPDEKFAR 

301 CLPNRLKQEG YATFAMHGAG SSLYDRFSWY PRAGFQEIKT AENLIGKKTC 

351 AIFGGVCDSE LFGEVSAXFK KHDKGLFYWM TLTSHADYPE SDIFNHRLKC 

401 TEYGLPAETD XCRNFSLHTQ FFDQLADLIQ RPEMKGTEVI IVGDHPPPVG 

451 NLNETFRYLK QGHVXWLNFK IK* 

ORF48a and ORF48-1 show 96.8% identity in 472 aa overlap: 

10 20 30 40 50 60 

or f 4 8a. pep MNI HTLLSKQWT L PPFLPKRLLLSLLILLXPNAVFWVLALLT AT ARP I VN LXYLPAALLI 
I I i I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I II I I I II III II 
orf 48-1 MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 48a . pep ALPWRXVKIXGVLMWLAVLFDGLMMVIQLFPFMDLIGAINLVPFIXTAPALYQIMTGLL 
I I I I I Ml I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 
orf 4 8-1 ALPWRFVKIAGVLAFWLAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 4 8a . pep LLYMLAMPFVLQKAAAKTDFRH I AACAAVWAAGYFTGHLSXYDRGRMAN IFGANNFYYA 

I I I I I I I I I I I I 1 I I II I I I I I I I : I I I I I : I I I I I I I I I I I I I I I II I I I I I I I I I I I 
orf 48-1 LLYMLAMPFVLQKAAAKTDFRHIAVCAAVVAAAGYFTGHLSYYDRGRMANIFGANNFYYA 

130 140 150 160 170 180 

190 200 210 220 230 240 

or f 4 8a . pep KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATHLNEPKSQKILFIVAESWGLPANP 

II II I II Ml Mill Mill III II IIIIIII MIIMII II Mill I I I M llllllll 
orf 48-1 KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATHLNEPKSQKILFIVAESWGLPANP 

190 200 210 220 230 240 

250 260 270 280 290 300 

or f 4 8a . pep ELQNATFAKLLAQKXRFSVWESGSFPFIGATIEGEMRELCAYGGLRGFALRRAPDEKFAR 
I I I I I II I II I I I I II II I I I I II I II I I 1:1 I M I II M II I I M I II I I I M I M M 
orf 48-1 ELQNATFAKLLAQKDRFSVWESGSFPFIGATVEGEMRELCAYGGLRGFALRRAPDEKFAR 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 48a. pep CLPNRLKQEGYATFAMHGAGSSLYDRFSWYPRAGFQEIKTAENLIGKKTCAIFGGVCDSE 
I II I II II I II II II I I M II I I II II I I I M I II I II II I M II I I I I II I I I I I I II I 
orf 48-1 CLPNRLKQEGYATFAMHGAGSSLYDRFSWYPRAGFQEIKTAENLIGKKTCAIFGGVCDSE 

310 320 330 340 350 360 
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370 380 390 400 410 420 

LFGEVSAXFKKHDKGLFYWMTLTSHADYPESDIFNHRLKCTEYGLPAETDXCRNFSLHTQ 

I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
LFGEVSAFFKKHDKGLFYWMTLTSHADYPESDIFNHRLKCTEYGLPAETDLCRNFSLHTQ 
370 380 390 400 410 420 

430 440 450 460 470 

FFDQLADLIQRPEMKGTEVIIVGDHPPPVGNLNETFRYLKQGHVXWLNFKIKX 
I II I I I I I I I I I [ I I I I I I M i II I I 11 I I I I 1 I I I t I I I! t M II II Nil 
FFDQLADLIQRPEMKGTEVIIVGDHPPPVGNLNETFRYLKQGHVAWLNFKIKX 

430 440 450 460 470 



if 



Homology with a predicted ORF from N. gonorrhoeae 

ORF48 shows 97.5% identity over a 119aa overlap with a predicted ORF (ORF48ng) from N. 
gonorrhoeae: 



orf 48. pep 
orf 48ng 
orf 48 .pep 
orf 48ng 



MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 
I I I I : I I I : I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
MNIHALLSEQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 



60 



60 



119 



ALPWRFVKIAGVLAFWLAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGL 
I I I I I I I I I I ! I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I \ I I I I 
ALPWRFVKIAGVLAFWPAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 120 



The ORF48ng nucleotide sequence <SEQ ED 473> was predicted to encode a protein having amino 
acid sequence <SEQ ID 474>: 



1 MNIHALLSEO WTLPPFLPKR LLL5LLILLA PNAVFWVLAL LTATA RPIVN 
51 LDYLPAALLI ALPWRFVKIA G VLAFWPAVL FDGLMMVIQL FPFMDLIGAI 
101 NLVPFI LTAP APY QIMTGLL LLYMLAMPFV L QKAAVKTDF RHIAVCAAW 
151 AAARYFTGPF ELLRTGGRWQ YVQHRRLLLS GSRASFRRRQ KADVLRRLGN 
201 PYASMGNGG. . 

Further work identified the complete gonococcal DNA sequence <SEQ ID 475>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 



ATGAATATTC 
GCCGAAACGG 
TGTTTTGGGT 
TTGGACTACC 
CAAAATTGCC 
TGATGATGGT 
AACCTCGTCC 
CGGGCTGTTG 
CCGCCGTCAA 
GCGGCAGCCG 
GATGGCCAAT 
CGATGCTCTA 
GTCGACCCCG 
GCTGAGTGAG 
GGGGGCTGCC 
CTGGCGCAAA 
CATCGGCGCG 
GTTTGCGCGG 
TGCCTCCCCA 
CGGCGCGGGT 
GCTTTCAAAA 
GCCATTTTCG 
ATTTTTCAAA 
GCCACGCCGA 
ACCGAATACG 
GCACACCCAA 
TGAAAGGCAC 
AACCTCAATG 
GCACTTCAAA 



ACGCCCTGCT 
CTGCTGCTGT 
TTTGGCACTG 
TTCCCGCCGC 
GGCGTATTGG 
GATCCAACTC 
CCTTCATCCT 
CTGCTGTATA 
AACCGACTTC 
GCTATTTCAC 
ATCTTCGGCG 
CACCGTCAGC 
TCTTCCTCCC 
CCGAAATCTC 
GGGCAATCCC 
AAGACCGTTT 
ACGGTCGAAG 
GTTCGCACTG 
ACCGTTTGAA 
AGTTCGCTTT 
AATCAAAACC 
GCGGCGTGTG 
AAACACGACA 
CTATCCCGAA 
GCCTGCCCGC 
TtcttcgACC 
GGAAGTCATC 
AAACCTTCCG 
ATCAAATAA 



CTCCGAACAA 
CCCTGCTGAT 
CTGACCGCCA 
GCTGCTGATC 
CGTTTTGGCC 
TTCCCTTTTA 
GACCGCCCCC 
TGCTGGCGAT 
CGACACATTG 
CGGCCATTTG 
CAAACAACTT 
CAGAATGCCG 
CTTGGGCAAT 
AAAAAATCCT 
GAGCTTCAAA 
TTCGGTTTGG 
GCGAAATGCG 
CGCCGCGCGC 
ACAAGAAGGT 
ACGACCGCTT 
GCCGAAAACC 
CGACAGCGAG 
AGGGACTGTT 
TCCGACATTT 
CGAAACCGAC 
AACTGGCGGA 
ATCGTCGGCG 
CTACCTCAAA 



TGGACGCTGC 
ACTGCTGGCC 
CCGCCCGCCC 
GCCCTGCCTT 
GGCGGTTTTG 
TGGACCTCAT 
GCCCCTTATC 
GCCGTTTGTG 
CCGTCTGTGC 
AGTTACTACG 
CTATTACGCC 
ACTTTATTAC 
CAGCAGCGTG 
CTTTATCGTC 
ACGCCACTTT 
GAAAGCGGCA 
CGAATTGTGC 
CCGACGAAAA 
TACGCCACCT 
CAGCTGGTAT 
TGATCGGTAA 
CTGTTCGGCG 
TTACTGGATG 
TCAACCACAG 
CTCTGCCGCA 
TTTGATCCGA 
ACCATCCGCC 
CAGGGACACG 



CGCCATTCCT 
CCCAATGCGG 
GATTGTCAAT 
GGCGTTTCGT 
TTTGACGGGC 
CGGCGCCATC 
AGATAATGAC 
TTGCAAAAAG 
CGCCGTTGTG 
ACCGGGGGCG 
aAAAGTCAGG 
CGCCGgcctG 
CCGCCACGCG 
GCCGAATCTT 
TGCCAAACTG 
GTTTTCCCTT 
GCCTACGGCG 
ATTTGCCCGC 
TTGCGATGCA 
CCGAGGGCGG 
AAAAACCTGC 
AAGTGTCGGC 
ACGCTGACCA 
GCTCAAATGC 
ATTTCAGCCT 
CGCCCCGAAA 
GCCCGTCGGC 
TCGCCTGGCT 



This encodes a protein having amino acid sequence <SEQ ID 476; ORF48ng-l>: 
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1 MNIHALLSEQ WTLPPFLPKR LLLSLLILLA PNAVFWVLAL LTATARPIVN 

51 LDYLPAALLI ALPWRFVKIA GVLAFWPAVL FDGLMMVIQL FPFMDLIGAI 

101 NLVPFILTAP APYQIMTGLL LLYMLAMPFV LQKAAVKTDF RHIAVCAAW 

151 AAAGYFTGHL SYYDRGRMAN IFGANNFYYA KSQAMLYTVS QNADFITAGL 

5 201 VDPVFLPLGN QQRAATRLSE PKSQKILFIV AESWGLPGNP ELQNATFAKL 

251 LAQKDRFSVW ESGSFPFIGA TVEGEMRELC AYGGLRGFAL RRAPDEKFAR 

301 CLPNRLKQEG YATFAMHGAG SSLYDRFSWY PRAGFQKIKT AENLIGKKTC 

351 AIFGGVCDSE LFGEVSAFFK KHDKGLFYWM TLTSHADYPE SDIFNHRLKC 

401 TEYGLPAETD LCRNFSLHTQ FFDQLADLIR RPEMKGTEVI IVGDHPPPVG 

10 451 NLNETFRYLK QGHVAWLHFK IK* 

ORG48ng-l and ORF48-1 show 97.9% identity in 472 aa overlap: 

10 20 30 40 50 60 

orf 4 8-1. pep MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 
| | || : I | | : | I I I I I I I II I I I I I I I I I I I j I I I I I II I I I II II I I I I I I I I I I I II I I 
15 orf48ng-l MNIHALLSEQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf4 8-l.pep ALPWRFVKIAGVLAFWLAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 

20 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 M I i I ll 1 1 1 1 1 

orf4 8ng-l ALPWRFVKIAGVLAFWPAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 

70 80 90 100 110 120 

130 140 150 160 170 180 

25 orf 48-1 .pep LLYMLAMPFVLQKAAAKTDFRH I AVCAAWAAAGYFTGHLSYYDRGRMAN IFGANNFYYA 

I I I I I I I I I I II I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I 
orf48ng-l LLYMLAMPFVLQKAAVKTDFRHIAVCAAWAAAGYFTGHLSYYDRGRMANIFGANNFYYA 

130 140 150 160 170 180 

30 190 200 210 220 230 240 

orf48-l.pep KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATHLNEPKSQKILFIVAESWGLPANP 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I : I I I I I I I I I I I I I I II I I : I I 
orf48ng-l KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATRLSEPKSQKILFIVAESWGLPGNP 

190 200 210 220 230 240 

35 

250 260 270 280 290 300 

orf 48-1. pep ELQNATFAKLLAQKDRFSVWESGSFPFIGATVEGEMRELCAYGGLRGFALRRAPDEKFAR 
I | | | | I I | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I 
orf48ng-l ELQNATFAKLLAQKDRFSVWESGSFPFIGATVEGEMRELCAYGGLRGFALRRAPDEKFAR 
40 " 250 260 270 280 290 300 

310 320 330 340 350 360 

orf 4 8-1. pep CLPNRLKQEGYATFAMHGAGSSLYDRFSWYPRAGFQEIKTAENLIGKKTCAIFGGVCDSE 
I | | | | I I I I II I I I I I I I I I I I I I I I I I I I I I I II I : I I I I I I I I I I I I I I I I I I I I I I I 
45 orf4 8ng-l CLPNRLKQEGYATFAMHGAGSSLYDRFSWYPRAGFQKIKTAENLIGKKTCAIFGGVCDSE 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 4 8-1 . pep LFGEVSAFFJCKHDKGLFYWMTLTSHADYPESDIFNHRLKCTEYGLPAETDLCRNFSLHTQ 
50 I I I I I I I I I I I I M I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I 

orf48ng-l LFGEVSAFFKKHDKGLFYWMTLTSHADYPESDIFNHRLKCTEYGLPAETDLCRNFSLHTQ 

370 380 390 400 410 420 

430 440 450 460 470 

55 orf 4 8-1. pep FFDQLADLIQRPEMKGTEVIIVGDHPPPVGNLNETFRYLKQGHVAWLNFKIKX 

I I || I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I : I I I I I 
orf48ng-l FFDQLADLIRRPEMKGTEVIIVGDHPPPVGNLNETFRYLKQGHVAWLHFKIKX 

430 440 450 460 470 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
60 and two putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 57 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 477>: 

1 . ♦ GTGAGCGGAC GTTACCGCGC TTTGGATCGC GTTTCCAAAA TCATCATCGT 

51 TACTTTGAGT ATCGCCACGC TTGCCGCCGC CGGCATCGCT ATGTCGCGCG 

101 GTATGCAGAT GCAGTCCGAT TTTATCGAGC CGACACCGTG GACGCTTGCC 

151 GGTTTGGGCT TCCTGATCGC GCTGATGGGC TGGATGCCCG CGCCGATTGA 

201 AATTTCCGCC ATCAATTCTT TGTGGGTAAC CGAAAAACAA CGCATCAATC 

251 CTTCCGAATA CCGCGACGGG ATTTTTGAAT TCAACGTCGG TTATATCGCC 

301 AGTGCGGTTT TGGCTTTGGT TTTCCTTGCA CTGGGCGC.G TAGCGCCGAA 

351 CGGCAACGGC GA . ACAGTGC AGATGGCGGG CGGCAAATAT AACGGGCAAT 

401 TGATCAATAT GTACGCC . . 

This corresponds to the amino acid sequence <SEQ ID 478; ORF53>: 



1 . . VSGRYRALDR VSKIIIVTLS IATLAAAGIA MSRGMQMQSD FIEPTPWTLA 
51 GLGFLIALMG WMPAPIEISA INSLWVTEKQ RINPSEYRDG IFEFNVGYIA 
101 SAVLALVFLA LGXVAPNGNG XTVQMAGGKY NGQLINMYA. . 

Further work revealed the complete nucleotide sequence <SEQ ID 479>: 



1 ATGTCCGAAC AACATATTTC GACTTGGAAA AGTAAAATCA ACGCATTGGG 

51 TCCGGGGATC ATGATGGCTT CGGCGGCGGT CGGCGGTTCG CACCTGATTG 

101 CCTCGACGCA GGCGGGCGCG CTTTACGGCT GGCAGATCGC GCTCATCATC 

151 ATCCTGACCA ACCTCTTCAA ATACCCGTTT TTCCGCTTCA GCGCGCATTA 

201 CACGCTGGAC ACGGGCAAGA GCCTGATTGA AGGTTATGCC GAGAAAAGCC 

251 GCGTTTATTT GTGGGTATTC CTGATTTTGT GCATCCTCTC CGCCACGATT 

301 AACGCGGGCG CGGTCGCCAT TGTAACCGCC GCCATCGTCA AAATGGCGAT 

351 TCCCTCGCTG ATGTTTGATG CCGGCACGGT TGCCGCCTTG ATTATGGCAT 

4 01 CCTGCCTGAT TATTTTGGTG AGCGGACGTT ACCGCGCTTT GGATCGCGTT 

4 51 TCCAAAATCA TCATCGTTAC TTTGAGTATC GCCACGCTTG CCGCCGCCGG 

501 CATCGCTATG TCGCGCGGTA TGCAGATGCA GTCCGATTTT ATCGAGCCGA 

551 CACCGTGGAC GCTTGCCGGT TTGGGCTTCC TGATCGCGCT GATGGGCTGG 

601 ATGCCCGCGC CGATTGAAAT TTCCGCCATC AATTCTTTGT GGGTAACCGA 

651 AAAACAACGC ATCAATCCTT CCGAATACCG CGACGGGATT TTTGATTTCA 

701 ACGTCGGTTA TATCGCCAGT GCGGTTTTGG CTTTGGTTTT CCTTGCACTG 

751 GGCGCGTTTG TGCAATACGG CAACGGCGAA GCAGTGCAGA TGGCGGGCGG 

801 CAAATATATC GGGCAATTGA TCAATATGTA CGCCGTTACC ATCGGCGGCT 

851 GGTCGCGCCC GCTGGTGGCG TTTATCGCGT TTGCCTGTAT GTACGGCACG 

901 ACGATTACCG TCGTGGACGG CTATGCCCGT GCCATTGCCG AACCCGTGCG 

951 CCTGCTGCGC GGAAAAGACA AAACGGGCAA CGCCGAATTC TTTGCCTGGA 

1001 ATATTTGGGT GGCGGGCAGC GGTTTGGCGG TGATTTTCTG GTTTGACGGC 

1051 GTAATGGCGA ATCTGCTCAA ATTTGCGATG ATTGCCGCTT TTGTGTCCGC 

1101 CCCTGTGTTT GCCTGGCTGA ATTACCGTTT GGTTAAAGGT GATGAAAAAC 

1151 ACAAACTCAC ATCAGGTATG AATGCCCTTG CATTGGCAGG CTTGATTTAT 

1201 CTGACCGGTT TTACCGTTTT GTTCTTATTG AATTTGGCGG GAATGTTCAA 

1251 ATGA 

This corresponds to the amino acid sequence <SEQ ID 480; ORF53-l>: 



1 MSEQHISTWK SKINALGPGI MMASAAVGGS 

51 ILTNLFKYPF FRFSAHYTLD TGKSLIEGYA 

101 NAGAV AIVTA AIVKMAIPSL MFD AGTVAAL 

151 S KIIIVTLSI ATLAAAGIAM SRGMQMQSDF 

201 MPAPIEISAI NSLWVTEKQR INPSEYRDGI 

251 GAFV QYGNGE AVQMAGGKYI GQLINMYAVT 

301 TITVVDGYAR AIAEPVRLLR GKDKTGNAEF 

351 VMAN LLKFAM IAAFVSAPVF AW LNYRLVKG 

401 LTGFTVLFLL NLAGMFK* 



HLIASTQAG A LYGWQIALII 
EKSRVYLW VF LILCILSATI 
IMASCLIILV SGRYRALDRV 
IEPTPW TLAG LGFLIALMGW 
FDFNVGY IAS AVLALVFLAL 
IGGWSRPL VA FIAFACMYGT 
FAWNIWVAGS GLAVIF WFDG 
DEKHKLTSGM NALALAGLIY 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF53 shows 93.5% identity over a 139aa overlap with an ORF (ORF53a) from strain A of N. 



meningitidis: 
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10 20 30 

orf 53 . pep VSGRYRALDRVSK IIIVTLSIATLAAAGIA 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
o r f 5 3 a AAI VKMAI PSL MFD AGTVAALIMASCLI ILV SGRYRAL DRVSK IIIVTLSIATLAAAGIA 

110 120 130 140 150 160 

40 50 60 70 80 90 

orf 53 . pep MSRGMQMQSDFIEPTP WTLAGLGFLIALMGWMPA PIEISAINSLWVTEKQRINPSEYRDG 
| | | | | | I | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 53a MSRGMQMQSDFIEPTPW TLAGLGFLIALMGWMPA PIEISAINSLWVTEKQRINPSEYRDG 
170 180 190 200 210 220 

100 110 120 130 139 

orf 53 . pep IFEFNVGY IASAVLALVFLALGXVA PNGNGXTVQMAGGKYNGQLINMYA 
I I : I I I I I I I I I I I I I I I I I I I : I I I : I I I I I I I I I I I I I II I 
orf 53a IFDFNVGY IASAVLALVFLALGAFV QYGNGEAVQMAGGKYIGQLINMYAVTIGGWSRPLV 
230 240 250 260 270 280 

orf 53a AFIAFACMYGTTITW DGYARAIAEPVRLLRGKDKTGNAEFFAWNIWVAGSGLAVIFWFD 
290 300 310 320 330 340 

The complete length ORF53a nucleotide sequence <SEQ ID 48 1> is: 



1 


ATGTCCGAAC 


51 


ACCGGGGATT 


101 


CCTCGACGCA 


151 


ATCCTGACCA 


201 


CACGCTGGAC 


251 


GCGTTTATTT 


301 


AACGCGGGCG 


351 


TCCCTCGCTG 


401 


CCTGCCTGAT 


451 


TCCAAAATCA 


501 


CATCGCTATG 


551 


CACCGTGGAC 


601 


ATGCCCGCGC 


651 


AAAACAACGC 


701 


ACGTCGGTTA 


751 


GGCGCGTTTG 


801 


CAAATATATC 


851 


GGTCGCGCCC 


901 


ACGATTACCG 


951 


CCTGCTGCGC 


1001 


ATATTTGGGT 


1051 


GTAATGGCGA 


1101 


CCCTGTGTTT 


1151 


ACAAACTCAC 


1201 


CTGACCGGTT 


1251 


ATGA 



AACATATTTC 
ATGATGGCTT 
GGCGGGCGCG 
ACCTCTTCAA 
ACGGGCAAGA 
GTGGGTATTC 
CGGTCGCCAT 
ATGTTTGATG 
TATTTTGGTG 
TCATCGTTAC 
TCGCGCGGTA 
GCTTGCCGGT 
CGATTGAAAT 
ATCAATCCTT 
TATCGCCAGT 
TGCAATACGG 
GGGCAATTGA 
GCTGGTGGCG 
TTGTGGACGG 
GGAAAAGACA 
GGCGGGCAGC 
ATCTGCTCAA 
GCCTGGCTGA 
ATCAGGTATG 
TTACCGTTTT 



GACTTGGAAA 
CGGCGGCGGT 
CTTTACGGCT 
ATACCCGTTT 
GCCTGATTGA 
CTGATTTTGT 
TGTAACCGCC 
CCGGCACGGT 
AGCGGACGTT 
TTTGAGTATC 
TGCAGATGCA 
TTGGGCTTCC 
TTCCGCCATC 
CCGAATACCG 
GCGGTTTTGG 
CAACGGCGAA 
TCAATATGTA 
TTTATCGCGT 
CTATGCCCGT 
AAACGGGCAA 
GGTTTGGCGG 
ATTTGCGATG 
ATTACCGTTT 
AATGCCCTTG 
GTTCTTATTG 



AGTAAAATCA 
CGGCGGTTCG 
GGCAGATCGC 
TTCCGCTTCA 
AGGTTATGCC 
GCATCCTCTC 
GCCATCGTCA 
TGCCGCCTTG 
ACCGCGCTTT 
GCCACGCTTG 
GTCCGATTTT 
TGATCGCGCT 
AATTCTTTGT 
CGACGGGATT 
CTTTGGTTTT 
GCAGTGCAGA 
CGCCGTTACC 
TTGCCTGTAT 
GCCATTGCCG 
CGCCGAATTC 
TGATTTTCTG 
ATTGCCGCTT 
GGTCAAAGGT 
CATTGGCAGG 
AATTTGGCGG 



ACGCATTGGG 
CACCTGATTG 
GCTCATCATC 
GCGCGCATTA 
GAGAAAAGCC 
CGCCACGATT 
AAATGGCGAT 
ATTATGGCAT 
GGATCGCGTT 
CCGCCGGCGG 
ATCGAGCCGA 
GATGGGCTGG 
GGGTAACCGA 
TTTGATTTCA 
CCTTGCACTG 
TGGCGGGCGG 
ATCGGCGGCT 
GTACGGCACG 
AACCCGTGCG 
TTTGCCTGGA 
GTTTGACGGC 
TTGTGTCCGC 
GATGAAAAAC 
CTTGATTTAT 
GAATGTTCAA 



This encodes a protein having amino acid sequence <SEQ ID 482>: 



1 MSEQHISTWK SKINALGPGI MMASAAVGGS HLIASTQAG A LYGWQIALII 

51 ILTNLF KYPF FRFSAHYTLD TGKSLIEGYA EKSRVYLW VF LILCILSATI 

101 NAGAV AIVTA AIVKMAIPSL MFD AGTVAAL IMASCLIILV SGRYRALDRV 

151 SK IIIVTLSI ATLAAAGIA M SRGMQMQSDF IEPTPW TLAG LGFLIALMGW 

201 MPAPIEISAI NSLWVTEKQR INPSEYRDGI FDFNVGY IAS AVLALVFLAL 

251 GAFV QYGNGE AVQMAGGKYI GQLINMYAVT IGGWSRPL VA FIAFACMYGT 

301 TITW DGYAR AIAEPVRLLR GKDKTGNAE F FAWNIWVAGS GLAVIF WFDG 

351 VMAN LLKFAM IAAFVSAPVF A WLNYRLVKG DEKHKLTSGM N ALALAGLIY 

401 LTGFTVLFL L NLAGMFK* 

ORF 53a shows 100.0% identity in 417 aa overlap with ORF53-1 : 



10 20 30 40 50 60 

orf 53a. pep MSEQHISTWKSKINALGPGIMMASAAVGGSHLIASTQAGALYGWQIALI I ILTNLFKYPF 
I | | II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I 
orf 53-1 MSEQHISTWKSKINALGPGIMMASAAVGGSHLIASTQAGALYGWQIALI I ILTNLFKYPF 

10 20 30 40 50 60 



70 80 90 100 110 120 
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10 



15 



20 



orf 53a . pep FRFSAHYTLDTGKSLIEGYAEKSRVYLWVFLILCILSATINAGAVAIVTAAIVKMAIPSL 
I I 1 I I I I I I I ! I I I I I I I I I I I I I ! I I I I i I I I I I I I I I I I I I I i I I I I I I I ! I I I I I I I 
orf 53-1 FRFS AH YTLDTGKSLI EG YAEKSRVYLWVFLI LC I LS AT INAGAVAI VTAAI VKMAI PS L 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 53a . pep MFDAGTVAALIMASCLI I LVSGRYRALDRVSKI 1 1 VTLS I ATLAAAGI AMSRGMQMQSDF 
I | I I I | I I I I I I t I I I I I I I I I II I ) II I 1 I I I I I I I I I I ! I I 1 I I I I I I I I I I I I I I I I 
orf 53-1 MFDAGTVAALIMASCLI ILVSGRYRALDRVSKI 1 1 VTLS IATLAAAGIAMSRGMQMQSDF 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 53a . pep IEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDGIFDFNVGYIAS 
| | | | | | | I I M I I I I I I I I I I I I I I I I 11 I I I I I I I I I I I I I I I I I I I I I I I II I I II I I 
orf 53-1 IEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDGIFDFNVGYIAS 

190 200 210 220 230 240 

250 260 270 280 290 300 

or f 53a . pep AVLALVFLALGAFVQYGNGEAVQMAGGKYIGQLINMYAVTIGGWSRPLVAFIAFACMYGT 
I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I 
orf 53-1 AVLALVFLALGAFVQYGNGEAVQMAGGKYIGQLINMYAVTIGGWSRPLVAFIAFACMYGT 

250 260 270 280 290 300 



25 



30 



35 



310 320 330 340 350 360 

orf 53a . pep TITWDGYARAIAEPVRLLRGKDKTGNAEFFAWNIWVAGSGLAVIFWFDGVMANLLKFAM 
I | | | I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 
orf 53-1 TITWDGYARAIAEPVRLLRGKDKTGNAEFFAWNIWVAGSGLAVIFWFDGVMANLLKFAM 

310 320 330 340 350 360 

370 380 390 400 410 

or f 53a . pep IAAFVSAPVFAWLNYRLVKGDEKHKLTSGMNALALAGLIYLTGFTVLFLLNLAGMFKX 
I I | I I I I I I I II I II I I I I I I I I I I II I I I II I I I I II II I I I I I I I I I I I I I I I I I I 
orf 53-1 IAAFVSAPVFAWLNYRLVKGDEKHKLTSGMNALALAGLIYLTGFTVLFLLNLAGMFKX 

370 380 390 400 410 

Homology with a predicted ORF from ^gonorrhoeae 

ORF53 shows 92.1% identity over a 139aa overlap with a predicted ORF (ORF53ng) from N. 



40 



45 



50 



gonorrhoeae: 

orf 53 .pep 
orf53ng 
orf 53. pep 
orf53ng 
orf 53. pep 
orf53ng 



VSGRYRALDRVSKIIIVTLSIATLAAAGIA 
I I I I I I I I I I I I I I I I I I I I I I I I II II I I 
AAIVKMAIPSLMFDAGTVAALIMASCLIILVSGRYRALDRVSKIIIVTLSIATLAAAGIA 



30 



91 



90 



MSRGMQMQSDFIEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDG 
I i I I I I I I 1 I I I I II M I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I II I I I I I I M 
MSRGMQMQPDFIEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDG 151 

IFEFNVGYIASAVLALVFLALGXVAPNGNGXTVQMAGGKYNGQLINMYA 139 
I I r | t | | 1 I I [ t I I i I 1 I r i f 1 : Ml :III:MII Mill III 

I FDFNVGYIASAVLALVFLALGAFVQYGNGEAVQMGGGKYIGQLINMYAVTIGGGSRPLV 211 



An ORF53ng nucleotide sequence <SEQ ID 483> was predicted to encode a protein having amino 



acid sequence <SEQ ID 484>: 



1 MPKKSCVYLW VFLILCIASA TINAGAVAIV TAAIVKMAIP SLMFDAGTVA 

51 ALIMASCLII LVSGRYRALD RVSK IIIVTL SI ATLAAAGI AM SRGMQMQP 

55 101 DFIEPTPW TL AGLGFLIALM GWMPA PIEIS AINSLWVTEK QRINPSEYRD 

151 GIFDFNVGY I ASAVLALVFL ALGAFV QYGN GEAVQMGGGK YIGQLINMYA 

201 VTIGGGSRPL VAFIAFACMY GAASTW DGY ARAIAEPVRL LRGKDKTARP 

251 IVLLEKLGGR HRFGRDFLV* 

Further analysis revealed further partial DNA gonococcal sequence <SEQ ID 485>: 

60 1 . . aagaAAAGCT GCGTTTATTT GTGGGTTTTT TTGATTTTGT GTATCGCCTC 

51 CGCCACGATT AACGCGGGCG CGGTCGCCAT TGTAACCGCC GCCATCGTCA 
101 AAATGGCGAT TCCCTCGCTG ATGTTTGATG CCGGCACGGT TGCCGCCTTG 
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151 ATTATGGCAT CCTGCCTGAT TATTTTGGTG AGCGGACGTT ACCGCGCTTT 

201 GGATCGTGTT TCCAAAATCA TCATTGTTAC TTTGAGCATC GCCACGCTTG 

251 CCGCCGCCGG CATCGCTATG TCGCGCGGTA TGCAGATGCA GCCCGATTTT 

301 ATCGAGCCGA CACCGTGGAC GCTTGCCGGT TTGGGCTTCC TGATCGCGCT 

351 GATGGGCTGG ATGCCCGCGC CGATCGAAAT TTCCGCCATC AATTCTTTGT 

401 GGGTAACCGA AAAACAACGC ATCAATCCTT CTGAATACCG CGACGGGATT 

451 TTCGATTTCA ACGTCGGTTA TATCGCcagT GCGGTTTTGG CTTTGGTTTT 

501 CCTTGCACTG GGCGCGTTTG TGCAATACGG CAACGGCGAA GCAGTGCAGA 

551 TGGCGGGCGG CAAATATATC GGGCAATTGA TTAATATGTA TGCCGTAACC 

601 ATCGGCGGCT GGTCTCGTCC GCTGGTGGCG TTTATCGCGT TTGCCTGTAT 

651 GTACGGCACG ACGATTACCG TTGTGGACGG TTATGCGCGT GCCATTGCCG 

701 AACCCGTGCG CCTGCTGCGC GGCAGGGATA AAACCGGCAA CGCCGAGTTG 

751 TTtgccTGGA ATATTTGGGT GGCGGGCAGC GGTTTGGCGG TGATTTTCTG 

801 GTTTGACggc gcaaTGGCgG AACtgcTCAA ATTTGCGATG ATtgccgcCT 

851 TTGTGTCCGC CCCTGTGTTC GCCTGGCTCA ACTACCGCCT CGTCAAAGGG 

901 GACAAACGCC ACAGGCTTAC CGCCGGTATG AACGCCCTTG CCATTGTCGG 

951 CCTGCTCTAC CTGGCCGGGT TTGCCGTTTT GTTCCTGTTG AACCTTACCG 

1001 GACTTTTGGC ATAG 

This corresponds to the amino acid sequence <SEQ ID 486; ORF53ng-l>: 



1 ..KKSCVYLWVF LILCIASATI NAGAVAIVTA AIVKMAIPSL MFDAGTVAAL 

51 IMASCLIILV SGRYRALDRV S KIIIVTLSI ATLAAAGIAM SRGMQMQPDF 

101 IEPTPW TLAG LGFLIALMGW MPA PIEISAI NSLWVTEKQR INPSEYRDGI 

151 FDFNVGY IAS AVLALVFLAL GAFV QYGNGE AVQMAGGKYI GQLINMYAVT 

201 IGGWSRPL VA FIAFACMYGT TITW DGYAR AIAEPVRLLR GRDKTGNAEL 

251 FAWNIWVAGS GLAVIF WFDG AMAE LLKFAM IAAFVSAPVF A WLNYRLVKG 

301 DKRHRLTAGM N ALAIVGLLY LAGFAVLFL L NLTGLLA* 

ORF53ng-l and ORF53-1 show 94.0% identity in 336 aa overlap: 



60 70 80 90 100 110 

orf 53-1 . pep ILTNLFKYPFFRFSAHYTLDTGKSLIEGYAEKSRVYLWVFLILCILSATINAGAVAIVTA 

: I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf53ng-l KKS CVYLWVFLI LC I AS AT INAGAVAI VTA 

10 20 30 



120 130 140 • 150 160 170 

orf 53-1. pep AIVKMAI PSLMFDAGTVAALIMASCLI ILVSGRYRALDRVSKI I IVTLS I ATLAAAGIAM 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf53ng-l AIVKMAI PSLMFDAGTVAALIMASCLI ILVSGRYRALDRVSKI I IVTLS I ATLAAAGIAM 

40 50 60 70 80 90 



180 190 200 210 220 230 

orf 53-1 . pep SRGMQMQSDFIEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDGI 
I I II I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I 1 
orf53ng-l SRGMQMQPDFIEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDGI 

100 110 120 130 140 150 



240 250 260 270 280 290 

orf 53-1 . pep FDFNVGYIASAVLALVFLALGAFVQYGNGEAVQMAGGKYIGQLINMYAVT IGGWSRPLVA 
I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I J I I I I 
orf53ng-l FDFNVGYIASAVLALVFLALGAFVQYGNGEAVQMAGGKYIGQLINMYAVTIGGWSRPLVA 
160 170 180 190 200 210 



300 310 320 330 340 350 

orf 53-1 . pep FIAFACMYGTTITWDGYARAIAEPVRLLRGKDKTGNAEFFAWNIWVAGSGLAVIFWFDG 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I: I I I I I I I I I I I I I I I I I I I I 
orf53ng-l FIAFACMYGTTITWDGYARAIAEPVRLLRGRDKTGNAELFAWNIWVAGSGLAVIFWFDG 
220 230 240 250 260 270 



360 370 380 390 400 410 

orf 53-1 . pep VMANLLKFAMIAAFVSAPVFAWLNYRLVKGDEKHKLTSGMNALALAGLIYLTGFTVLFLL 
: I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I : : I : I I : I I I I I I : : I I : I I : I I : I I I I I 
orf53ng-l AMAELLKFAMIAAFVSAPVFAWLNYRLVKGDKRHRLTAGMNALAIVGLLYLAGFAVLFLL 

280 290 300 310 320 330 

orf 53-1 .pep NLAGMFKX 

ll:|:: 
orf53ng-l NLTGLLAX 
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Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and//. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 58 

The following partial DNA sequence was identified in N, meningitidis <SEQ ID 487>: 



i 

51 
101 
151 
201 
251 
301 



. TTGCGGGAAA 
TGCGCTTGCC 
TGCGCGAGGT 
CTGCCTGAAA 
GCTTTTCCAC 
TCCGTTTCTG 
GTTCCGCCT . 



CGGCATATGT 
GGCTTGTTTT 
TTCTGCGTGG 
TCAAAGACGG 
GCCGTCAAAA 
CCGAAACTAT 



TTTGGATAGT 
TTGTCCGCGC 
CAGGAAAAGA 
TATGCCCGAT 
CGGCAGTGTA 
CTGGCGCACG 



TTTGATCGTT 
ACAATCCGAA 
AAGGGGAAAA 
TTTCCCGAAC 
TTGGCTGTTT 
AATCCGAACC 



ATTTTGTTGT 
CGCGAGTGGA 
ACAGGCGGAG 
TTGCCCTGAT 
GTCGGTGTCG 
GGACAGGCCC 



This corresponds to the amino acid sequence <SEQ ID 488; ORF58>: 

1 . . LRETAYVLDS FDRYFWAM GLFFVRAQSE REWMREVSAW QEKKGEKQAE 
51 LPEIKDGMPD FPELALML FH AVKTAVYWLF VGW RFCRNY LAHESEPDRP 
101 VPP. . 

Further work revealed the complete nucleotide sequence <SEQ ID 489>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 



ATGTTTTGGA 
GTTTTTTGTC 
CGTGGCAGGA 
GACGGTATGC 
CAAAACGGCA 
ACTATCTGGC 
GCAAACCGTG 
AAACGGGACG 
AGGCTGCCGA 
ATCCCATTCG 
AATTTCGCCC 
CGCGTGCTTT 
GATGCATTTG 
TACCCCGATG 
AACGCACGTA 
TCTGCGGATT 
CTTTTCTGCA 
ATGCAGGGCA 
CAAGGGCAGT 
CGTTTCCGTC 
CGCGAATTTC 
GATGTCGAAA 
GTCTGTGGGA 
TTGAAGAACC 
CCGAAAGTTC 
GGAAATCTAC 
AACGCAGCCG 
GGAGGTTGGC 
TGCGGCAGAG 
GGCATGACAG 
CCGTCCTGCC 
TGAAGAAACC 
TGCCTCCGCT 
GAAAACAGCA 
CAAGGTTGTC 
AACCCGATGT 
TTGGCGCGTT 
CGGCAAAACC 



TAGTTTTGAT 
CGCGCACAAT 
AAAGAAAGGG 
CCGATTTTCC 
GTGTATTGGC 
GCACGAATCC 
CGGATGTTCC 
GAAGAAGCGG 
TACGGAAGAC 
ACCGGAGTAT 
GTCCGTCCGG 
AAACAGCGCG 
AGAAAAACGA 
GAAGGGCTGC 
TTCCCATATG 
ACGGATTTGA 
GTCAAAGCCG 
GGGGAAAGGG 
CCGTTTCAGA 
AATTTGAAAG 
TCGCCTGATT 
TGCCGTCTGA 
TACGGCGGTC 
TGCCGCGCCC 
CCATGACCGC 
AACCGTACCT 
CATTGCCGAG 
AGGAGGAAAC 
CGGTCAAGCG 
TCAGGCGGTT 
GGGTATCGGA 
GGTGCGGTAT 
GTTCAATCCC 
TCACCATCGA 
GATTCTTATT 
CGGCGTGCGC 
CGCTCGGCGT 
TGCATGGGTT 



CGTTATTTTG 
CCGAACGCGA 
GAAAAACAGG 
CGAACTTGCC 
TGTTTGTCGG 
GAACCGGACA 
GACCGCATCC 
AAACGGAAGA 
ATTGCAACTG 
TGCTGAAGGG 
TTTTTAAAGA 
GCTTTAAGGG 
AACAGCGGTC 
AGATTATCGG 
TTCGATGCGG 
GCCGTATTTT 
AAAATGCACG 
CAGGCGGAGG 
CGGCACGGCC 
AACCGAACAA 
CCGGAAAGTC 
AACCGAAAAT 
CGGTTTATGA 
GATGCTTGGG 
AATCGATATT 
ATGAACCGCC 
ACCGACCATC 
CGCCGCTATT 
GGCAATATCT 
TGTCCGTTTG 
TACGGAAGCG 
CCGAACACCT 
GAGGCGACGC 
AGAAAAATTG 
CCGGCCCCGT 
GGCAATTCCG 
GGCTTCCATC 
TGGAACTTCC 



TTGCTTGCGC 
GTGGATGCGC 
CGGAGCTGCC 
CTGATGCTTT 
TGTCGTCCGT 
GGCCCGTTCC 
GACGGATATT 
AGCAGAAGCT 
CCGTAATCGA 
TTGATGCCGT 
AATCACTTTG 
AAACGAAAAA 
CCCAAAGTCC 
TTTGGACGAC 
ACAAAGAAGC 
GAGAAGCAGC 
GAATGCGCCG 
CAAAATCCCC 
GTCCGCGATG 
GGCAACGGTT 
AGACGGTTGT 
GTTTTCACGG 
TGAAACTGCC 
TGGTCGAACC 
CAGCCGCCGC 
GTCAGGATTC 
TTGCCGATGA 
GCGGATGACG 
GTCGGAAACC 
AAAATGTGCC 
GATGAAGGGG 
GCCGACAACC 
AAACCGAAGA 
GCGGAGTTCA 
AATTACGCGT 
TTCTGAATCT 
CGCGTTGTCG 
GAACCCGAAA 



TTGCCGGCTT 
GAGGTTTCTG 
TGAAATCAAA 
TCCATGCCGT 
TTCTGCCGAA 
GCCTGCTTCT 
CAGACAGTGG 
GCGGAGGAAG 
CAACCGCCGC 
CTGAAAGCGA 
GAAGAAGCAA 
ACGCTATATC 
GCGTGTCCGA 
CCTGTGCTTC 
GTTTTCCGAG 
ATCCGTCTGC 
TTCCACCGTC 
GGATGTTTCC 
CCCGCCGCCG 
TCTGCGGAGG 
CGGGAAACGG 
AAACCGTTTC 
GATATCCATA 
ACCCGAAGTG 
CTCCCGTATC 
GAGCAGGTGC 
TGTTTTGAAT 
GCAGTGAAGG 
GAAGCGTTCG 
GTCTGAACGC 
CGTTCCCATC 
GACCTGCTTC 
AGAACTGTTG 
AAGTCAAGGT 
TATGAAATCG 
GGAAAAAGAT 
AAACCATCCC 
CGCCAAATGA 
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1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 



TACGCCTGAG 
AAGCTGACGC 
CGACTTGGGA 
GCAAATCGGT 
GCGCCGGAAG 
GAGCATTTAC 
TGAAGCTGGC 
CGCTACCGCC 
TCAAAAAATC 
TCAGCCTCAC 
GTGGTCGTGG 
AATCGAAGAA 
TCCATTTGAT 
CTGATTAAGG 
AATCGACAGC 
GTCAGGGCGA 
GTTCACGGCG 
TTTGAAACAG 
GCGGCAGCGA 
GATCCGATGT 
CAGCATTTCG 
CGCGTCTGAT 
CACAACGGCA 



CGAAATCTTC 
TCGCGCTCGG 
AAAGCACCGC 
GGGTGTCAAC 
ACGTGCGTAT 
GAAGGCATCC 
GGCAAACGCG 
TGATGAGCTT 
GCCGAAGCCG 
GCCCGACGAT 
TCGATGAGTT 
CTGATTGCCC 
TCTTGCCACA 
CGAACATCCC 
CGCACGATTC 
TATGCTGTTC 
CGTTTGCCTC 
TTTGGCGAAC 
AGAGCTGCCC 
ACGACGAGGC 
GGCGTACAGC 
TGACCAGATG 
ACCGTACGAT 



AATTCGCCCG 
TCAGGACATC 
ATTTGTTGGT 
GCGATGATTC 
GATTATGATC 
CGCACCTGCT 
CTGAACTGGT 
TATGGGCGTG 
CAGCAAGGGG 
CCCGAACCTT 
TGCCGACCTG 
GCCTCGCCCA 
CAACGCCCCA 
GACGCGTATC 
TCGACCAAAT 
CTGCTGCCGG 
GGATGAAGAG 
CGGACTATGT 
GGCATCGGGC 
CGTATCCGTT 
GCGCCTTGCG 
GAGGCGGAAG 
TCTCGTCCCC 



AGTTTGCCGA 
ACCGGACAGC 
TGCCGGCACG 
TGTCTATGCT 
GATCCGAAAA 
CGCCCCTGTC 
GTGTTAACGA 
CGTAATCTTG 
AGAAAAAATC 
TGGAAAAACT 
ATGATGACGG 
AAAAGCCCGC 
GCGTCGATGT 
GCGTTCCAAG 
GGGCGCGGAA 
GTACTGCCTA 
GTGCACCGCG 
TGACGATATT 
GCAGCGGCGA 
GTCCTGAAAA 
TATCGGCTAC 
GCATTGTGTC 
TTGGACAATG 



ATCCAAATCC 
CCGTCGTAAC 
ACCGGTTCGG 
TTTCAAAGCC 
TGCTGGAATT 
GTTACCGATA 
AATGGAAAAA 
CGGGCTTCAA 
GGCAATCCGT 
GCCGTTTATC 
CAGGCAAGAA 
GCGGCAGGCA 
CATCACGGGT 
TGTCCAGCAA 
AACCTGCTCG 
TCCGCAGCGC 
TGGTCGAATA 
TTGAGCGGCG 
CGACGAAACC 
CGCGCAAAGC 
AACCGCGCCG 
CGCACCGGAA 
CTTGA 



This corresponds to the amino acid sequence <SEQ ID 490; ORF58-l>: 



1 MFWIVLIVIL LLALAGLFFV RAQS EREWMR EVSAWQEKKG EKQAELPEIK 

51 DGMPDFPELA LM LFHAVKTA VYWLFVGW R FCRNYLAHES EPDRPVPPAS 

101 ANRADVPTAS DGYSDSGNGT EEAETEEAEA AEEEAADTED IATAVIDNRR 

151 IPFDRSIAEG LMPSESEISP VRPVFKEITL EEATRALNSA ALRETKKRYI 

201 DAFEKNETAV PKVRVSDTPM EGLQIIGLDD PVLQRTYSHM FDADKEAFSE 

251 SADYGFEPYF EKQHPSAFSA VKAENARNAP FHRHAGQGKG QAEAKSPDVS 

301 QGQSVSDGTA VRDARRRVSV NLKEPNKATV SAEARISRLI PESQTWGKR 

351 DVEMPSETEN VFTETVSSVG YGGPVYDETA DIHIEEPAAP DAWWEPPEV 

401 PKVPMTAIDI QPPPPVSEIY NRTYEPPSGF EQVQRSRIAE TDHLADDVLN 

451 GGWQEETAAI ADDGSEGAAE RSSGQYLSET EAFGHDSQAV CPFENVPSER 

501 PSCRVSDTEA DEGAFPSEET GAVSEHLPTT DLLLPPLFNP EATQTEEELL 

551 ENSITIEEKL AEFKVKVKW DSYSGPVITR YEIEPDVGVR GNSVLNLEKD 

601 LARSLGVASI RWETIPGKT CMGLELPNPK RQMIRLSEIF NSPEFAESKS 

651 KLTLALGQDI TGQPWTDLG KAPHLLVAGT TGSGKSVGVN AMILSMLFKA 

701 APEDVRMIMI DPKMLELSIY EGIPHLLAPV VTDMKLAANA LNWCVNEMEK 

751 RYRLMS FMGV RNLAGFNQKI AEAAARGEKI GNPFSLTPDD PEPLEKLPFI 

801 VWVDEFADL MMTA GKKIEE LIARLAQKAR AAGIHLILAT QRPSVDVITG 

851 LIKANIPTRI AFQVSSKIDS RTILDQMGAE NLLGQGDMLF LLPGTAYPQR 

901 VHGAFASDEE VHRWEYLKQ FGEPDYVDDI LSGGGSEELP GIGRSGDDET 

951 DPMYDEAVSV VLKTRKASIS GVQRALRIGY NRAARLIDQM EAEGIVSAPE 

1001 HNGNRTILVP LDNA* 



Computer analysis of this amino acid sequence predicts the indicated transmembrane region, and 
also gave the following results: 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF58 shows 96.6% identity over a 89aa overlap with an ORF (ORF58a) from strain A of K 
meningitidis: 



10 20 30 40 50 60 

orf 58 . pep LRETAYVLDS FDRYFWALAGLFEVRAQSEREWMREVS AWQEKKGEKQAELPE I KDGMPD 

: : : I I I I I I I I I 11 I I I I t I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 58a MFWIVLIVILLLALAGLFFVRAQS EREWMREVSAWQEKKGEKQAELPEIKDGMPD 

10 20 30 40 50 



70 80 90 100 

orf 58 .pep FPELALM LFHAVKTAVYWLFVGW RFCRNYLAHESEPDRPVPP 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I 
orf 58a FPELALM LFHAVKTAVYWLFVGW RFCRNYLAHESEPDRPVPPASANRADVPTASDGYSD 
60 70 80 90 100 110 
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The complete length ORF58a nucleotide sequence <SEQ ED 49 1> is: 

1 ATGTTTTGGA TAGTTTTGAT CGTTATTTTG TTGCTTGCGC TTGCCGGCTT 

51 GTTTTTTGTC CGCGCACAAT CCGAACGCGA GTGGATGCGC GAGGTTTCTG 

101 CGTGGCAGGA AAAGAAAGGG GAAAAACAGG CGGAGCTGCC TGAAATCAAA 

151 GACGGTATGC CCGATTTTCC CGAACTTGCC CTGATGCTTT TCCATGCCGT 

201 CAAAACGGCA GTGTATTGGC TGTTTGTCGG TGTCGTCCGT TTCTGCCGAA 

251 ACTATCTGGC GCACGAATCC GAACCGGACA GGCCCGTTCC GCCTGCTTCT 

301 GCAAATCGTG CGGATGTTCC GACCGCATCC GACGGATATT CAGACAGTGG 

351 AAACGGGACG GAAGAAGCGG AAACGGAAGA AGCAGAAGCT GCGGAGGAAG 

401 AGGCTGCCGA TACGGAAGAC ATTGCAACTG CCGTAATCGA CAACCGCCGC 

4 51 ATCCCATTCG ACCGGAGTAT TGCTGAAGGG TTGATGCCGT CTGAAAGCGA 

501 AATTTCGCCC GTCCGTCCGG TTTTTAAGGA AATCACTTTG GAAGAAGCAA 

551 CGCGTGCTTT AAACAGCGCG GCTTTAAGGG AAACGAAAAA ACGCTATATC 

601 GATGCATTTG AGAAAAACGA AACAGCGGTC CCCAAAGTCC GCGTGTCCGA 

651 TACCCCGATG GAAGGGCTGC AGATTATCGG TTTGGACGAC CCTGTGCTTC 

701 AACGCACGTA TTCCCGTATG TTCGATGCGG ACAAAGAAGC GTTTTCCGAG 

751 TCTGCGGATT ACGGATTTGA GCCGTATTTT GAGAAGCAGC ATCCGTCTGC 

801 CTTTTCTGCA GTCAAAGCCG AAAATGCACG GAATGCGCCG TTCCGCCGTC 

851 ATGCAGGGCA GGGNAAAGGG CAGGCGGAGG CNAAATCCCC GGATGTTTCC 

901 CAAGGGCAGT CCGTTTCAGA CGGCACAGCC GTCCGCGATG CCNGCCGCCG 

951 CGTTTCCGTC AATTTGAAAG AACCGAACAA GGCAACGGTT TCTGCGGAGG 

1001 CGCGGATTTC GCGCCTGATT CCGGAAAGTC GGACGGTTGT CGGGAAACGG 

1051 GATGTCGAAA TGCCGTCTGA AACCGAAAAT GTTTTCACGG AAANTGTTTC 

1101 GTCTGTGGGA TACGGCGNTC CGGTTTATGA TGAAACTGCC GATATCCATA 

1151 TTGAAGAACC TGCCGCGCCC GATGCTTGGG TGGTCGAACC ACCCGAAGTG 

1201 CCGAAAGTTC CCATGCCCGC AATNGATATT CCGCCGCCGC CTCCCGTATC 

1251 GGAAATCTAC AACCGTACCT ATGAACCGCC GGCAGGATTC GAGCAGGTGC 

1301 AACGCAGCCG CATTGCCGAA ACCGATCATC TTGCCGATGA TGTTTTGAAT 

1351 GGAGGTTGGC AGGAGGAAAC CGCCGCTATT GCGAATGACG GCAGTGAGGG 

1401 TGTGGCAGAG CGGTCAAGCG GGCAATATTT GTCGGAAACC GAAGCGTTCG 

1451 GGCATGACAG TCAGGCGGTT TGTCCGTTTG AAAATGTGCC GTCTGAACGC 

1501 CCGTCCCGCC GGGCATNGGA TACGGAAGCG GATGAAGGGG CGTTCCAATC 

1551 TGAAGAAACC GGTGCGGTAT CCGAACACCT GCCGACAACC GACCTGCTTC 

1601 TGCCGCCGCT GTTCAATCCC GGGGCGACGC AAACCGAAGA AGANCTGTTG 

1651 GANAACAGCA TCACCATCGA AGAAAAATNG GCGGAGTTCA AAGTCAAGGT 

1701 CAAGGTTGTC GATTCTTATT CCGGCCCCGT GATTACGCGT TATGAAATCG 

1751 AACCCGATGT CGGCGTGCGC GGCAATTCCG TTCTAAATCT GGAAAAAGAN 

1801 TTGGCGCGTT CGCTCGGCGT GGCTTCCATC CGCGTTGTCG AAACCATCCT 

1851 CGGCAAAACC TGTATGGGTT TGGAACTTCC GAACCCGAAA CGCCAAATGA 

1901 TACGCCTGAG CGAAATCTTC AATTCGCCCG AGTTTGCCGA ATCCAAATCC 

1951 AAGCTGACGC TCGCGCTCGG TCAGGACATC ACCGGACAGC CCGTCGTAAC 

2001 CGACTTGGGC AAAGCACCGC ATTTGTTGGT TGCCGGCACG ACCGGTTCGG 

2051 GCAAATCGGT GGGTGTCAAC GCGATGATTC TGTCTATGCT TTTCAAAGCC 

2101 GCGCCGGAAG ACGTGCGTAT GATTATGATC GATCCGAAAA TGCTGGAATT 

2151 GAGCATTTAC GAAGGCATCC CGCACCTGCT CGCCCCTGTC GTTACCGATA 

2201 TGAAGCTGGC GGCAAACGCG CTGAACTGGT GTGTTAACGA AATGGAAAAA 

2251 CGCTACCGCC TGATGAGCTT TATGGGCGTG CGCAATCTTG CGGGTNTCAA 

2301 TCAAAAAATC GCCGAAGCCG CAGCAAGGGG GGAGAAAATC GGCAACCCGT 

2351 TCAGCCTCAC GCCCGACAAT CCCGAACCTT TGGANAAATT GCCGTTTATC 

2401 GTGGTCGTGG TTGATGAGTT TGCCGACCTG ATGATGACGG CAGGCAAGAA 

2451 AATCGAAGAA CTGATTGCCC GCCTCGCCCA AAAAGCCCGC GCGGCAGGCA 

2501 TCCATCTTAT CCTTGCCACA CAACGCCCCA GTGTCGATGT CATCACGGGT 

2551 CTGATTAAGG CGAACATCCC GACGCGTATC GCGTTCCAAG TGTCCAGCAA 

2601 AATCGACAGC CGCACGATTC TTGACCAAAT GGGTGCGGAA AACCTGCTCG 

2651 GGCAGGGCGA TATGCTGTTC CTGCCGCCGG GTACGGCCTA TCCGCAGCGC 

2701 GTTCACGGCG CGTTTGCCTC GGATGAAGAG GTGCACCGCG TGGTCGAATA 

2751 TCTGAAACAG TTTGGCGAAC CGGACTATGT TGACGATATN TTGAGCGGCG 

2801 GTATGTCCGA CGATTTGCTG GGAATCAGCC GGAGCGGCGA CGGCGAAACC 

2851 GATCCGATGT ACGACGAGGC CGTGTCNGTT GTTTTGAAAA CGCGCAAAGC 

2 901 CAGCATTTCT GGCGTGCAGC GCGCATTGCG TATCGGCTAT AATCGCGCCG 

2951 CGCGTCTGAT TGACCAGATG GAGGCGGAAG GCATTGTGTC CGCACCGGAA 

3001 CACAACGGCA ACCGTACGAT TCTCGTCCCC TTNGACAATG CTTGA 

This encodes a protein having amino acid sequence <SEQ ID 492>: 

1 MFWIVLIVIL LLALAGLFFV RAQS EREWMR EVSAWQEKKG EKQAELPEIK 

51 DGMPDFPELA LM LFHAVKTA VYWLFVGW R FCRNYLAHES EPDRPVPPAS 

101 ANRADVPTAS DGYSDSGNGT EEAETEEAEA AEEEAADTED IATAVIDNRR 

151 IPFDRSIAEG LMPSESEISP VRPVFKEITL EEATRALNSA ALRETKKRYI 

201 DAFEKNETAV PKVRVSDTPM EGLQIIGLDD PVLQRTYSRM FDADKEAFSE 
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10 



15 



251 SADYGFEPYF 

301 QGQSVSDGTA 

351 DVEMPSETEN 

401 PKVPMPAXDI 

451 GGWQEETAAI 

501 PSRRAXDTEA 

551 XNSITIEEKX 

601 LARSLGVASI 

651 KLTLALGQDI 

701 APEDVRMIMI 

751 RYRLMSFMGV 

801 WWDEFADL 

851 LIKANIPTRI 

901 VHGAFASDEE 

951 DPMYDEAVSV 

1001 HNGNRTILVP 



EKQHPSAFSA 
VRDAXRRVSV 
VFTEXVSSVG 
PPPPPVSEIY 
ANDGSEGVAE 
DEGAFQSEET 
AEFKVKVKW 
RWETILGKT 
TGQPWTDLG 
DPKMLELSIY 
RNLAGXNQKI 
MMTAGKKIEE 
AFQVSSKIDS 
VHRWEYLKQ 
VLKTRKASIS 
XDNA* 



VKAENARNAP 
NLKEPNKATV 
YGXPVYDETA 
NRTYEPPAGF 
RSSGQYLSET 
GAVSEHLPTT 
DSYSGPVITR 
CMGLELPNPK 
KAPHLLVAGT 
EGIPHLLAPV 
AEAAARGEKI 
LIARLAQKAR 
RTILDQMGAE 
FGEPDYVDDX 
GVQRALRIGY 



FRRHAGQGKG 
SAEARISRLI 
DIHIEEPAAP 
EQVQRSRIAE 
EAFGHDSQAV 
DLLLPPLFNP 
YEIEPDVGVR 
RQMIRLSEIF 
TGSGKSVGVN 
VTDMKLAANA 
GNPFSLTPDN 
AAGIHLILAT 
NLLGQGDMLF 
LSGGMSDDLL 
NRAARLIDQM 



QAEAKSPDVS 

PESRTWGKR 

wDAWWEPPEV 

TDHLADDVLN 

CPFENVPSER 

GATQTEEXLL 

GNSVLNLEKX 

NSPEFAESKS 

AMILSMLFKA 

LNWCVNEMEK 

PEPLXK LPFI 

QRPSVDVITG 

LPPGTAYPQR 

GISRSGDGET 

EAEGIVSAPE 



ORF58a and ORF58-1 show 96.6% identity in 1014 aa overlap: 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



70 



10 20 30 40 50 60 

orf 58a . pep MFWIVLIVILLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPELA 

I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I i I I I I I i I I I I I I I I I I I I I I I I I I I I I I 

orf 58-1 MFWIVLIVILLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPELA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 58a . pep LMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPPASANRADVPTASDGYSDSGNGT 

I I I I I I I I I I I I II I I I I I I I I II I I I I ! I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I 
orf 58-1 LMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPPASANRADVPTASDGYSDSGNGT 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 58a . pep EEAETEEAEAAEEE AADTEDIATAVI DNRRI PFDRS IAEGLMPSESE I S PVRPVFKEITL 
I I I I II I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I 
orf 58-1 EEAETEEAEAAEEEAADTED I AT AVI DNRRI PFDRS IAEGLMPSESE IS PVRPVFKEITL 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 58a . pep EEATRALNSAALRETKKRYIDAFEKNETAVPKVRVSDTPMEGLQIIGLDDPVLQRTYSRM 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I : I 
orf 58-1 EEATRALNSAALRETKKRYIDAFEKNETAVPKVRVSDTPMEGLQIIGLDDPVLQRTYSHM 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 58a . pep FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFRRHAGQGKGQAEAKSPDVS 
I I I 1 E I I I 1 I I I I I I !1 t 1 I I I I 1 I I I I I I t 1 I I I I I I 1 I ! r 1 t I 1 I I I I I I I I I I I I I I 
orf 58-1 FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFHRHAGQGKGQAEAKSPDVS 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 58a. pep QGQSVSDGTAVRDAXRRVSVNLKEPNKATVSAEARISRLIPESRTWGKRDVEMPSETEN 
I I I I I I I I I II I I I I I I I I I I I I I I I M I I I I I 1 I I I I I I I I : I I I I I I I I I I I I 1 M I 
orf 58-1 QGQSVSDGTAVRDARRRVSVNLKEPNKATVSAEARISRLIPESQTWGKRDVEMPSETEN 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 58a . pep VFTEXVSSVGYGXPVYDETADIHIEEPAAPDAWWEPPEVPKVPMPAXDI PPPPPVSEIY 

1111:1111111 I I M M I I I I I I I I I I I I I I I I I i I I II I I I I I II MINIMI 
orf 58-1 VFTETVSSVGYGGPVYDETADIHIEEPAAPDAWWEPPEVPKVPMTAIDIQPPPPVSEIY 
370 380 390 400 410 420 

430 440 450 460 470 480 

orf 58a . pep NRTYEPPAGFEQVQRSRIAETDHLADDVLNGGWQEETAAIANDGSEGVAERSSGQYLSET 

I I II I II M II II M I II II II II M II I II I II II II II I M II II : II I II M II II I 
Orf 58-1 NRTYEPPSGFEQVQRSRIAETDHLADDVLNGGWQEETAAIADDGSEGAAERSSGQYLSET 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf 58a . pep EAFGHDSQAVCPFENVPSERPSRRAXDTEADEGAFQSEETGAVSEHLPTTDLLLPPLFNP 

II I M I II II II II II II II I I I : I II I II I M M II II II I I II I II M II II II I 
orf 58-1 EAFGHDSQAVCPFENVPSERPSCRVSDTEADEGAFPSEETGAVSEHLPTTDLLLPPLFNP 

490 500 510 520 530 540 



WO 99/24578 



-290- 



PCT/IB98/01665 



550 560 570 580 590 600 

orf 58a . pep GATQTEEXLLXNSITIEEKXAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKX 

MINI I I I I I I I I I I I I I I I I I II I I I I 1 I I I I I I I I II I I I I I I I I I I I I I I I 
orf 58-1 EATQTEEELLENSITIEEKLAEFKVKVKVVDSYSGPVITRYEIEPDVGVRGNSVLNLEKD 
550 560 570 580 590 600 



10 



610 620 630 640 650 660 

orf 58a. pep LARSLGVASIRWETILGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDI 

I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 58-1 LARSLGVASIRWETIPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDI 
610 620 630 640 650 660 



15 



20 



25 



30 



670 680 690 700 710 720 

orf 58a . pep TGQPWTDLGKAPHLLVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIY 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
or f 5 8 - 1 TGQP WTDLGKAPHLLVAGTTGSGKS VGVNAMI LSMLFKAAPEDVRMIMI DPKMLELS I Y 

670 680 690 700 710 720 

730 740 750 760 770 780 

or f 58a . pep EGIPHLLAPWTDMKLAANALNWCVNEMEKRYRLMSFMGVRNLAGXNQKIAEAAARGEKI 

II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I 
orf 58-1 EGIPHLLAPWTDMKLAANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKI 

730 740 750 760 770 780 

790 800 810 820 830 840 

orf 58a . pep GNPFSLTPDNPEPLXKLPFIWWDEFADLMMTAGKKIEELIARLAQKARAAGIHLILAT 
111111111:1111 I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I II I I I I I I I I I I 
orf 58-1 GNPFSLTPDDPEPLEKLPFIWWDEFADLMMTAGKKIEELIARLAQKARAAGIHLILAT 

790 800 810 820 830 840 



35 



850 860 870 880 890 900 

orf 58a . pep QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLPPGTAYPQR 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I MINIM 
orf 58-1 QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLLPGTAYPQR 

850 860 870 880 890 900 



40 



910 920 930 940 950 960 

orf 58a . pep VHGAFASDEEVHRWEYLKQFGEPDYVDDXLSGGMSDDLLGISRSGDGETDPMYDEAVSV 
I I II I I I I I I I I I I I II I I I I I II I I I I I I I I I I :: I Ihllll II I I I I II I I I I 
orf 58-1 VHGAFASDEEVHRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDDETDPMYDEAVSV 

910 920 930 940 950 960 



45 



970 980 990 1000 1010 

or f 58a . pep VLKTRKAS I SGVQRALRIGYNRAARLI DQME AEG I VSAPEHNGNRT I LVPXDNAX 
I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I 1 I I I II I I I I I I I I 
orf 58-1 VLKTRKAS I SGVQRALRI GYNRAARL I DQMEAEG I VSAPEHNGNRT I LVPLDNAX 

970 980 990 1000 1010 



50 Homology with a predicted ORF from N. gonorrhoeae 

ORF58 shows complete identity over a 9aa overlap with a predicted ORF (ORF58ng) from N. 
gonorrhoeae: 

orf 58 .pep ALMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPP 103 

I I I I I I I I I 

55 orf58ng SEPDRPVPPASANRADVPTASDGYSDSGNG 30 

The ORF58ng nucleotide sequence <SEQ ID 493> is predicted to encode a protein having partial 
amino acid sequence <SEQ ID 494>: 



60 



l 

51 
101 
151 
201 
251 
301 



. SEPDflPVPPA 
DIATAVIDNR 
AALRETKKRY 
MFDADKEAFS 
GQAEAKSPDV 
IPESRTWGK 
PDAWWEPPE 



SANRADVPTA 
RIPFDRSIAE 
I DAFEKNGTA 
ESADYGFEPY 
SQGQSVSDGT 
RDVEMPSETE 
VPEVAVPEID 



SDGYSDSGNG 
GLMQSESKTS 
VPKVRVSDTP 
FEKQHPSAFS 
AVRDARRRVS 
NVFTETVSSV 
ILPPPPVSEI 



TEEAETEAAE 
PVRPVFKEIT 
MEGLQIIGLD 
AVKAENARNA 
VNLKEPNKAT 
GYGGPVYDEA 
YNRTYEPPAG 



AAEEEAADTE 
LEEATRALSS 
DPVLQRTYSR 
PFRRHAGQEK 
VSAEARISRL 
ADIHIEEPAA 
FEQAQRSRIA 
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351 ETDHLAADVL NGGWQEETAA IADDGSEGAA ERSSGQYLSE TEAFGHDSQA 

401 VCPFEDVPSE RPSCRVSDTE ADEGAFQSEE TGAVSEHLPT TDLLLPPLFN 

451 PEATQTEEEL LENSITIEEK LAEFKVKVKV VDSYSGPVIT RYEIEPDVGV 

501 RGNSVLNLEK DLARSLGVAS IRWETIPGK TCMGLELPNP KRQMIRLSEI 

551 FNSPEFAESK SKLTLALGQD ITGQPWTDL GKAPHLLV AG TTGSGKS VGV 

601 NAMILSMLFK AAPEDVRMIM IDPKMLELSI YEGITHLLAP WTDMKLAAN 

651 ALNWCVNEME KRYRLMSFMG VRNLAGFNQK IAEAAARGEK IGNPFSLTPD 

701 DPEPLE KLPF IWWDEFAD LMMT AGKKIE ELIARLAQKA RAAGIHLILA 

751 TQRPSVDVIT GLIKANIPTR IAFQVSSKID SRTILDQMGA ENLLGQGDML 

801 FLPPGTAYPQ RVHGAFASDE EVHRWEYLK QFGEPDYVDD ILSGGGSEEL 

851 PGIGRSGDGE TDPMYDEAVS WLKTRKASI SGVQRALRIG YNRAARLIDQ 

901 MEAEGIVSAP EHNGNRTILV PLDNA* 

This partial gonococcal sequence contains a predicted transmembrane region and a predicted 
ATP/GTP-binding site motif A (P-loop; double underlined). Furthermore, it has a domain 
homologous to the FTSK cell division protein of E. coli. Alignment of ORF58ng and FtsK 
(accession number p46889) show a 65 % amino acid identity in 459 overlap: 

ORF58ng: 4 67 IEEKLAE FKVKVKVVDS YSGP VI T RYE IE PDVGVRGNSVLNLEK DLARSLGVAS I RWET 526 

+E +LA+F++K W+ GPVITR+E+ GV+ + NL +DLARSL ++RWE 
FtsK: 868 VEARLADFRIKADWNYSPGPVITRFELNLAPGVKAARISNLSRDLARSLSTVAVRWEV 927 

ORF58ng: 527 IPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDITGQPWTDLGKAPHL 586 

IPGK +GLELPN KRQ + L E+ ++ +F ++ S LT+ LG+DI G+PW DL K PHL 
FtsK : 928 I PGKPYVGLELPNKKRQTVYLREVLDNAKFRDNPS PLTWLGKDI AGEPWADLAKMPHL 987 

ORF58ng : 587 LVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIYEGITHLLAPWTDMK 64 6 

LVAGTTGSGKSVGVNAMILSML+KA PEDVR IMIDPKMLELS+YEGI HLL WTDMK 
FtsK: 988 LVAGTTG SGKSVGVNAMI LSMLYKAQPEDVRFIMI DPKMLELSVYEG I PHLLTE WTDMK 1047 

ORF58ng: 647 LAANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKIGNPFSLTPDDPEP — 704 

AANAL WCVNEME+RY+LMS +GVRNLAG+N+KIAEA I +P+ D + 

FtsK: 1048 DAANALRWCVNEMERRYKLMSALGVRNLAGYNEKIAEADRMMRPIPDPYWKPGDSMDAQH 1107 

ORF58ng: 705 — LEKLPFIWWDEFADLMMTAGKKIEELIARLAQKARAAGIHLILATQRPSVDVITGL 762 

L+K P+IW+VDEFADLMMT GKK+EELIARLAQKARAAGIHL+LATQRPSVDVITGL 
FtsK: 1108 PVLKKEPYIWLVDEFADLMMTVGKKVEELIARLAQKARAAGIHLVLATQRPSVDVITGL 1167 

ORF58ng: 763 I KAN I PTRI AFQVS SKI DSRT I LDQMGAENLLGQGDMLFL P PGT AYPQRVHGAFAS DEEV 822 

IKANIPTRIAF VSSKIDSRTILDQ GAE+LLG GDML+ P + P RVHGAF D+EV 
FtsK: 1168 IKANIPTRIAFTVSSKIDSRTILDQAGAESLLGMGDMLYSGPNSTLPVRVHGAFVRDQEV 1227 

ORF58ng: 823 HRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDGETDPMYDEAVSWLKTRKASISG 882 

H W+ K G P YVD IS SE G G G E DP++D+AV V + RKASISG 
FtsK: 1228 HAWQDWKARGRPQYVDGITSDSESEGGAG-GFDGAEELDPLFDQAVQFVTEKRKASISG 1286 

ORF58ng: 883 VQRALRIGYNRAARLIDQMEAEGIVSAPEHNGNRTILVP 921 

VQR RIGYNRAAR+I+QMEA+GIVS HNGNR +L P 
FtsK: 1287 VQRQFRIGYNRAARIIEQMEAQGIVSEQGHNGNREVLAP 1325 

Further work on ORF58ng revealed the complete gonococcal DNA sequence to be <SEQ ID 495>: 

1 ATGTTTTGGA TAGTTTTGAT CGTTATtgtg TTGCTTGCGC TTGCCGGCCT 

51 GTTTTTTGTC CGCGCACAAT CCGAACGCGA GTGGATGCGC GAGGTTTCTG 

101 CGTGGCAGGA AAAGAAAGGG GAAAAACAGG CGGAGCTGCC TGAAATCAAA 

151 GACGGTATGC CCGATTTTCC CGAGTTTTCC CTGATGCTTT TCCATGCCGT 

201 CAAAACGGCA GTGTATTGGC TGTTTGTCGG TGTCGTCCGT TTCTGCCGAA 

251 ACTATCTGGC GCACGAATCC GAACCGGACA GGCCCGTTCC GCCTGCTTCT 

301 GCAAACCGTG CGGATGTTCC GACCGCATCC GACGGGTATT CAGACAGTGG 

351 AAACGGGACG GAAGAAGCGG AAACGGAAGC AGCAGAAGCT GCGGAGGAAG 

401 AGGCTGCCgA TACgGAAGAC ATTGCAACTG CCGTAATCGA CAACCGCCGC 

451 ATCCcatTCG ACCGGAGTAT TGCTGAAGGG TTGATGCAGT CTGAAAGCAA 

501 AACTTCGCCC GTCCGTCCGG TTTTTAAGGA AATCACTTTG GAAGAAGCAA 

551 CGCGTGCTTT AAGCAGCGCG GCTTTAAGGG AAACGAAAAA ACGCTATATC 

601 GATGCATTTG AGAAAAACGG AACAGCCGTC CCCAAAGTAC GCGTGTCCGA 

651 TACCCCGATG GAAGGGCTGC AGATTATCGG TTTGGACGAC CCTGTGCTTC 

701 AACGCACGTA TTCCCGTATG TTTGATGCGG ACAAAGAAGC GTTTTCCGAG 

751 TCTGCGGATT ACGGATTTGA GCCGTATTTT GAGAAGCAGC ATCCGTCTGC 
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801 CTTTTCTGCA GTCAAAGCCG AAAATGCACG GAATGCGCCG TTCCGCCGTC 

851 ATGCAGGGCA GGAGAAAGGG CAGGCGGAGG CAAAATCCCC GGATGTTTCC 

901 CAAGGGCAGT CCGTTTCAGA CGGCACAGCC GTCCGCGATG CCCGCCGCCG 

951 CGTTTCCGTC AATTTGAAAG AACCGAACAA GGCAACGGTT TCTGCGGAGG 

1001 CGCGGATTTC GCGCCTGATT CCGGAAAGTC GGACGGTTGT CGGGAAACGG 

1051 GATGTCGAAA TGCCGTCTGA AACCGAAAAT GTTTTCACGG AAACCGTTTC 

1101 GTCTGTGGGA TACGGCGGTC CGGTTTATGA TGAAGCTGCC GATATCCATA 

1151 TTGAAGAGCC TGCCGCGCCC GATGCTTGGG TGGTCGAACC ACCCGAAGTG 

1201 CCGGAGGTAG CCGTACCCGA AATCGATATT CTGCCGCCGC CTCCCGTATC 

1251 GGAAATCTAC AACCGTACCT ATGAGCCGCC GGCAGGATTC GAGCAGGCGC 

1301 AACGCAGCCG CATTGCCGAA ACCGACCATC TTGCCGCTGA TGTTTTGAAT 

1351 GGAGGTTGGC AGGAGGAAAC CGCCGCTATT GCAGATGACG GCAGTGAGGG 

1401 TGCGGCAGAG CGGTCAAGCG GGCAATATCT GTCGGAAACC GAAGCGTTCG 

1451 GGCATGACAG TCAGGCGGTT TGTCCGTTTG AAGATGTGCC GTCTGAACGC 

1501 CCGTCCTGCC GGGTATCGGA TACGGAAGCG GATGAAGGGG CGTTCCAATC 

1551 GGAAGAGACC GGTGCGGTAT CCGAACACCT GCCGACAACC GACCTGCTTC 

1601 TGCCTCCGCT GTTCAATCCC GAGGCGACGC AAACCGAAGA AGAACTGTTG 

1651 GAAAACAGCA TCACCATCGA AGAAAAATTG GCGGAGTTCA AAGTCAAGGT 

1701 CAAGGTTGTC GATTCTTATT CCGGCCCCGT GATTACGCGT TATGAAATCG 

1751 AACCCGATGT CGGCGTGCGC GGCAATTCCG TTCTGAATTT GGAAAAAGAC 

1801 TTGGCGCGTT CGCTCGGCGT GGCTTCCATC CGCGTTGTCG AAACCATCCC 

1851 CGGCAAAACC TGCATGGGTT TGGAACTTCC GAACCCGAAA CGCCAAATGA 

1901 TACGCCTGAG CGAAATTTTC AATTCGCCCG AGTTTGCCGA ATCCAAATCC 

1951 AAGCTGACGC TCGCGCTCGG TCAGGACATT ACCGGACAGC CCGTCGTAAC 

2001 CGACTTGGGC AAAGCACCGC ATTTGCTGGT TGCCGGCACG ACCGGTTCGG 

2051 GCAAATCGGT GGGTGTCAAC GCGATGATTC TGTCTATGCT TTTCAAAGCC 

2101 GCGCCGGAAG ACGTGCGTAT GATTATGATC GATCCGAAAA TGCTGGAATT 

2151 GAGCATTTAC GAAGGCATCA CGCACCTGCT CGCCCCTGTC GTTACCGATA 

2201 TGAAGCTGGC GGCAAACGCG CTGAACTGGT GTGTTAACGA AATGGAAAAA 

2251 CGCTACCGCC TGATGAGCTT TATGGGCGTG CGCAATCTTG CGGGCTTCAA 

2301 CCAAAAAATC GCCGAAGCCG CAGCAAGGGG AGAAAAAATC GGCAATCCGT 

2351 TCAGCCTCAC GCCCGACGAT CCCGAACCTT TGGAAAAACT GCCGTTTATC 

2401 GTGGTCGTGG TCGATGAGTT TGCCGATTTG ATGATGACGG CAGGCAAGAA 

24 51 AATCGAAGAA CTGATTGCGC GCCTCGCCCA AAAAGCCCGC GCGGCAGGCA 

2501 TCCACCTTAT CCTTGCCACA CAACGCCCCA GCGTCGATGT CATCACGGGT 

2551 CTGATTAAGG CGAACATCCC GACGCGTATC GCGTTCCAAG TGTCCAGCAA 

2601 AATCGACAGC CGCACGATTC TCGAGCAAAT GGGCGCGGAA AACCTGCTCG 

2651 GTCAGGGCGA TATGCTGTTC CTGCCGCCGG GTACTGCCTA TCCGCAGCGC 

2701 GTTCACGGCG CGTTTGCCTC GGATGAAGAG GTGCACCGCG TGGTCGAATA 

2751 TCTGAAGCAG TTTGGCGAGC CGGACTATGT TGACGATATT TTGAGCGGCG 

2801 GCGGCAGCGA AGAGCTGCCC GGCATCGGGC GCAGCGGCGA CGGCGAAACC 

2851 GATCCGATGT ACGACGAGGC CGTATCCGTT GTCCTGAAAA CGCGCAAAGC 

2901 CAGCATTTCG GGCGTACAGC GCGCCTTGCG CATCGGCTAC AACCGCGCCG 

2951 CGCGTCTGAT TGACCAAATG GAAGCGGAAG GCATTGTGTC CGCACCGGAA 

3001 CACAACGGCA ACCGTACGAT TCTCGTCCCC TTGGACAATG CTTGA 

This corresponds to the amino acid sequence <SEQ ID 496; ORF58ng-l>: 



MFWIVLIVIV LLALAGLFFV RAQS EREWMR 
LMLFHAVKTA VYWLFVGWR 



1 

51 DGMPDFPEFS 

101 ANRADVPTAS 

151 IPFDRSIAEG 

201 DAFEKNGTAV 

251 SADYGFEPYF 

301 QGQSVSDGTA 

351 DVEMPSETEN 

401 PEVAVPEIDI 

451 GGWQEETAAI 

501 PSCRVSDTEA 

551 ENSITIEEKL 

601 LARSLGVASI 

651 KLTLALGQDI 

701 APEDVRMIMI 

751 RYRLMSFMGV 

801 WWDEFADL 

851 LIKANIPTRI 

901 VHGAFASDEE 

951 DPMYDEAVSV 

1001 HNGNRTILVP 



DGYSDSGNGT 
LMQSESKTSP 
PKVRVSDTPM 
EKQHPSAFSA 
VRDARRRVSV 
VFTETVSSVG 
LPPPPVSEIY 
ADDGSEGAAE 
DEGAFQSEET 
AEFKVKVKW 
RWETIPGKT 
TGQPWTDLG 
DPKMLELSIY 
RNLAGFNQKI 
MMT AGKKIEE 
AFQVSSKIDS 
VHRWEYLKQ 
VLKTRKASIS 
LDNA* 



EEAETEAAEA 
VRPVFKEITL 
EGLQIIGLDD 
VKAENARNAP 
NLKEPNKATV 
YGGPVYDEAA 
NRTYEPPAGF 
RSSGQYLSET 
GAVSEHLPTT 
DSYSGPVITR 
CMGLELPNPK 
KAPHLLVAGT 
EGITHLLAPV 
AEAAARGEKI 
LIARLAQKAR 
RTILDQMGAE 
FGEPDYVDDI 
GVQRALRIGY 



EVSAWQEKKG 
FCRNYLAHES 
AEEEAADTED 
EEATRALSSA 
PVLQRTYSRM 
FRRHAGQEKG 
SAEARISRLI 
DIHIEEPAAP 
EQAQRSRIAE 
EAFGHDSQAV 
DLLLPPLFNP 
YEIEPDVGVR 
RQMIRLSEIF 
TGSGKSVGVN 
VTDMKLAANA 
GNPFSLTPDD 
AAGIHLILAT 
NLLGQGDMLF 
LSGGGSEELP 
NRAARLIDQM 



EKQAELPEIK 
EPDKPVPPAS 
IATAVIDNRR 
ALRETKKRYI 
FDADKEAFSE 
QAEAKSPDVS 
PESRTWGKR 
DAWWEPPEV 
TDHLAADVLN 
CPFEDVPSER 
EATQTEEELL 
GNSVLNLEKD 
NSPEFAESKS 
AMILSMLFKA 
LNWCVNEMEK 
PEPLEK LPFI 
QRPSVDVITG 
LPPGTAYPQR 
GIGRSGDGET 
EAEGIVSAPE 



ORF58ng-l and ORF58-1 show 97.2% identity in 1014 aa overlap: 
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10 20 30 40 50 60 

orf 58-1 . pep MFWIVLIVILLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPELA 

I I I I I I I I I : I I I I I I I I I I I I I I I I I I li I I I I I I I I I I I I I I I I I I I I I I I I I I I I : : 
orf58ng-l MFWIVLIVIVLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPEFS 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 58-1 . pep LMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPPASANRADVPTASDGYSDSGNGT 
I | I I I I I I I I I I I i I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf58ng-l LMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPPASANRADVPTASDGYSDSGNGT 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 58-1 . pep EEAETEEAEAAEEEAADTEDIATAVI DNRRI PFDRS I AEGLMPSESE I S PVRPVFKEITL 
11)111 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I t I I I I I ill: IMIIII1IIM 
orf58ng-l EEAETEAAEAAEEEAADTEDIATAVI DNRRI PFDRS IAEGLMQSESKTS PVRPVFKEITL 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 58-1 . pep EEATRALNSAALRETKKRYIDAFEKNETAVPKVRVSDTPMEGLQIIGLDDPVLQRTYSHM 
I I I || I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I : I 
orf58ng-l EEATRALSSAALRETKKRYIDAFEKNGTAVPKVRVSDTPMEGLQIIGLDDPVLQRTYSRM 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 58-1 . pep FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFHRHAGQGKGQAEAKSPDVS 
I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I : II II I I I I I I I I I I I I I 
orf58ng-l FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFRRHAGQEKGQAEAKSPDVS 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 58-1 . pep QGQSVS DGT AVRDARRRVSVNLKE PNKAT VSAE ARI S RL I PE SQT WGKRDVEMPSETEN 
I I I ! I 1 1 I I I I I I I I I I I I I 1 1 I I I I I I I I I I I I I I I I 1 1 I I I : I I I I I I I 1 1 I 1 1 I II I 
orf58ng-l QGQSVSDGTAVRDARRRVSVNLKE PNKATVSAE ARI SRLIPESRT WGKRDVEMPSETEN 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 58-1 . pep VFTETVSSVGYGGPVYDETADIHIEEPAAPDAWVVEPPEVPKVPMTAIDIQPPPPVSEIY 
I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I : I : III I I I I I I I I I 
orf58ng-l VFTETVS SVG YGGPVYDEAADIHIEEPAAPDAWWEPPEVPEVAVPE I DILPPPPVSE I Y 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 58-1 . pep NRTYEPPSGFEQVQRSRIAETDHLADDVLNGGWQEETAAIADDGSEGAAERSSGQYLSET 
I I I I I I I : I I I I : I I I I I I 1 I I I I I II I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I 
orf58ng-l NRTYEPPAGFEQAQRSRIAETDHLAADVLNGGWQEETAAIADDGSEGAAERSSGQYLSET 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf 58-1 . pep EAFGHDSQAVCPFENVPSERPSCRVSDTEADEGAFPSEETGAVSEHLPTTDLLLPPLFNP 
I | I I I I I I I I I I I I : I II I I I I I II I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I 
orf58ng-l EAFGHDSQAVCPFEDVPSERPSCRVSDTEADEGAFQSEETGAVSEHLPTTDLLLPPLFNP 

490 500 510 520 530 540 

550 560 570 580 590 600 

orf 58-1 . pep EATQTEEELLENSITIEEKLAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKD 
| | | | | | | I | I I I I I I I 1 I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf58ng-l EATQTEEELLENSITIEEKLAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKD 

550 560 570 580 590 600 

610 620 630 640 650 660 

orf 58-1. pep LARSLGVASIRWETIPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDI 
M II Ml IMIIII I II I IIIMIIM M I M II il I I M I! II I! II I I II M II MM 
orf58ng-l LARSLGVASIRWETIPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDI 

610 620 630 640 650 660 

670 680 690 700 710 720 

orf 58-1. pep TGQPWTDLGKAPHLLVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIY 
I I I I I II I I I I I I II I II I I I I I I I I I I I I I I I II I I I I I I I I I I I II II I I I I I I I II I 
orf58ng-l TGQPWTDLGKAPHLLVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIY 

670 680 690 700 710 720 
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730 740 750 760 770 780 

orf 58-1 . pep EGIPHLIAPWTDMKIAANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKI 
III I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I 
orf58ng-l EGITHLLAPWTDMKLAANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKI 

730 740 750 760 770 780 

790 800 810 820 830 840 

orf 58-1 .pep GNPFSLTPDDPEPLEKLPFIWWDEFADLMMTAGKKIEELIARLAQKARAAGIHLILAT 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
orf58ng-l GNPFSLTPDDPEPLEKLPFIVWVDEFADLMMTAGKKIEELIARLAQKARAAGIHLILAT 

790 800 810 820 830 840 

850 860 870 880 890 900 

orf 58-1 . pep QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLLPGTAYPQR 
I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 
orf58ng-l QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLPPGTAYPQR 

850 860 870 880 890 900 

910 920 930 940 950 960 

orf 58-1 .pep VHGAFASDEEVHRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDDETDPMYDEAVSV 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I II II I I II II I 
orf58ng-l VHGAFASDEEVHRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDGETDPMYDEAVSV 

910 920 930 940 950 960 

970 980 990 1000 1010 

orf 58-1 .pep VLKTRKAS I SGVQRALRIGYNRAARLIDQMEAEGI VS APEHNGNRT I LVPLDNAX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf58ng-l VLKTRKAS I SGVQRALRIGYNRAARLI DQMEAEGIVS APEHNGNRT I LVPLDNAX 

970 980 990 1000 1010 



Furthermore, ORF58ng-l shows significant homology to the Exoli protein FtsK: 

sp|P4 6889|FTSK_EC0LI CELL DIVISION PROTEIN FTSK >gi | 1651412 | gnl | PID | dl015290 (Dl 
division protein FtsK [Escherichia coli] >gi | 1651418 | gnl | PID | dl015296 (D90727) Cell 
division protein FtsK [Escherichia coli] >gi 11787117 (AE000191) cell division 
protein FtsK [Escherichia coli J Length = 1329 
Score = 576 bits (1469), Expect » e-163 

Identities - 301/459 (65%) , Positives = 353/459 (76%), Gaps - 5/459 (1%) 

Query: 556 IEEKLAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKDLARSLGVASIRWET 615 

+E +LA+F++K W+ GPVITR+E+ GV+ + NL +DLARSL ++RWE 
Sbjct: 868 VEARLADFRIKADWNYSPGPVITRFELNLAPGVKAARISNLSRDLARSLSTVAVRWEV 927 

Query: 616 I PGKTCMGLELPNPKRQMIRLSEI FNSPEFAESKSKLTLALGQDITGQPWTDLGKAPHL 675 

IPGK +GLELPN KRQ + L E+ ++ +F ++ S LT+ LG+DI G+PW DL K PHL 
Sbjct : 928 IPGKPYVGLELPNKKRQTVYLREVLDNAKFRDNPSPLTWLGKDIAGEPWADLAKMPHL 987 

Query: 676 LVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIYEGITHLLAPWTDMK 735 

LVAGTTGSGKSVGVNAMILSML+KA PEDVR IMIDPKMLELS+YEGI HLL WTDMK 
Sbjct: 988 LVAGTTGSGKSVGVNAMILSMLYKAQPEDVRFIMIDPKMLELSVYEGIPHLLTEWTDMK 1047 

Query: 736 LAANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKIGNPFSLTPDDPEP — 793 

AANAL WCVNEME+RY+LMS +GVRNLAG+N+KIAEA I +P+ D + 

Sbjct: 1048 DAANALRWCVNEMERRYKLMSALGVRNLAGYNEKIAEADRMMRPIPDPYWKPGDSMDAQH 1107 

Query: 794 — LEKLPFIVVVVDEFADLMMTAGKKIEELIARLAQKARAAGIHLILATQRPSVDVITGL 851 

L+K P+IW+VDEFADLMMT GKK+EELIARLAQKARAAGIHL+LATQRPSVDVITGL 
Sbjct: 1108 PVLKKEPYIWLVDEFADLMMTVGKKVEELIARLAQKARAAGIHLVLATQRPSVDVITGL 1167 

Query: 852 IKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLPPGTAYPQRVHGAFASDEEV 911 

IKANIPTRIAF VSSKIDSRTILDQ GAE+LLG GDML+ P + P RVHGAF D+EV 
Sbjct: 1168 IKANIPTRIAFTVSSKIDSRTILDQAGAESLLGMGDMLYSGPNSTLPVRVHGAFVRDQEV 1227 

Query: 912 HRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDGETDPMYDEAVSWLKTRKASISG 971 

H W+ K G P YVD IS SE G G G E DP++D+AV V + RKASISG 
Sbjct: 1228 HAWQDWKARGRPQYVDGITSDSESEGGAG-GFDGAEELDPLFDQAVQFVTEKRKASISG 1286 

Query: 972 VQRALRIGYNRAARLIDQMEAEGIVSAPEHNGNRTILVP 1010 

VQR RIGYNRAAR+I+QMEA+GIVS HNGNR +L P 
Sbjct: 1287 VQRQFRIGYNRAARIIEQMEAQGIVSEQGHNGNREVLAP 1325 
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Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 59 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 497>: 

1 ATGATTTATC AAAGAAACCT CATCAAAGAA CTCTCTTTTA CCGCCGTCGG 

51 CATTTTCGTC GTCCTCTTGG CGGTATTGGT CTCCACGCAG GCAATCAACC 

101 TGCTCGGCCG TGCCGCCGAC GGGC . . GTGA TCGCCATCGA TGCCGTGTTG 

151 GCATTGGTCG GCTTCTGGGT C 

// 

901 A TTGCCATCGG TTTGTTTTTA ATTTACCAAA ACGGGCTGAC 

951 CCTGCTTTTT GAAGCCGTGG AAGACGGCAA AATCCATTTT TGGCTCGGAC 

1001 TGCTGCCTAT GCACATTATC ATGTTTGTCC TTGCACTCAT CCTGTTGCGC 

1051 GTCCGCAGTA TGCCCAGCCA GCCCTTCTGG CAGGCGGTTG GCAAAAGTCT 

1101 GACATTGAAA GGCGGAAAAT GA 

This corresponds to the amino acid sequence <SEQ ID 498; ORF101>: 



1 MIYQRNLIKE LSFTAVGIFV VLLAVLVSTQ AINLLGRAAD GXVIAIDAVL 
51 ALVGFWV 

// 

301 . ..IAIGLFL IYQNGLTLLF EAVEDGKIHF WLGLLPMHII MFVLALILLR 
351 VRSMPSQPFW QAVGKSLTLK GGK* 

Further work revealed the complete nucleotide sequence <SEQ ID 499>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 



ATGATTTATC 
CATTTTCGTC 
TGCTCGGCCG 
TTGGTCGGCT 
CGCATTTATC 
AAATGTCGGT 
CCGGTGATGC 
GCTTTGGGTG 
TCCTGAAGCA 
AGTTTGGGCA 
CGAATCCGGC 
GCGGCGACAA 
AACAAACGCA 
CGGACGCGCC 
TCAGCACCAC 
CCGACCGCCC 
GATGTGGCGC 
CCGTGCCGCT 
TTGATTGCCA 
TTTTGAAGCC 
CTATGCACAT 
AGTATGCCCA 
GAAAGGCGGA 



AAAGAAACCT 
GTCCTCTTGG 
TGCCGCCGAC 
TCTGGGTCAT 
AGTACGTTGA 
CTGGCTATCC 
AGTTTGCCGT 
ATACCGTGGG 
GAAGCAGGAA 
AGCGCAACGG 
ATCATGAAAA 
CATCATCTTC 
CGCTCGAATT 
GACTACAATC 
GCCCAAACTC 
AACTGATTGG 
ATCTCGCTGA 
TTCCTATTTC 
TCGGTTTGTT 
GTGGAAGACG 
TATCATGTTT 
GCCAGCCCTT 
AAATGA 



CATCAAAGAA 
CGGTATTGGT 
GGGCGTGTCG 
CGGTATGACG 
CCGTGTTGAC 
TGCGGATTGG 
GCCGTTTGCC 
CAGAGCTACG 
TTGTCTTTGG 
CAGGGTTTAT 
ACCTGTTCCT 
GCCAAAGAAG 
GCGCCACGGC 
AGGTTTCCTT 
ATCGACCCCG 
CAGCAGCAAC 
CCGTCAGCGT 
AACCCGCGCA 
TTTAATTTAC 
GCAAAATCCA 
GCCGTTGCAC 
CTGGCAGGCG 



CTCTCTTTTA 
CTCCACGCAG 
CCATCGATGC 
CCGCTTTTGC 
CCGCTACTGG 
CATTGAAACA 
GTTTTGGTTG 
CAGCCGCGAA 
TGGAGGCAGG 
TTTGTCGAAA 
GCGCGAACAG 
GTAACTTCTC 
TACCGTTACA 
CCAAAAACTC 
TTTCCCACCG 
CCGCAACATC 
CCTCCTACTC 
GCGGACATAC 
CAAAACGGGC 
TTTTTGGCTC 
TCATCCTGTT 
GTTGGCAAAA 



CCGCCGTCGG 
GCAATCAACC 
CGTGTTGGCA 
TGGTGTTGAC 
CGCGACAGCG 
ATGGATACGC 
CCGTCATGCA 
TACGCTGAAA 
CGAGTTCAAC 
CCTTCGATAC 
GACAAAAACG 
GCTGAACGAC 
GCGGCACGCC 
AACCTGATTA 
CCGTACCATT 
AGGCGGAATT 
TGCCTGCTTG 
CTACAATATC 
TGACCCTGCT 
GGACTGCTGC 
GCGCGTCCGC 
GTCTGACATT 



This corresponds to the amino acid sequence <SEQ ID 500; ORP101-1>: 



1 MIYQRNLIKE LSFTAVGIFV VLLAVLVSTQ A INLLGRAAD GRVAIDAVLA 

51 LVGFWVIGMT PLLL VLTAFI STLTVLTRYW RDSEMSVWLS CGLALKQWIR 

101 PVM QFAVPFA VLVAVMQLWV I PWAELRSRE YAEILKQKQE LSLVEAGEFN 

151 SLGKRNGRVY FVETFDTESG IMKNLFLREQ DKNGGDNIIF AKEGNFSLND 

201 NKRTLELRHG YRYSGTPGRA DYNQVSFQKL NLIISTTPKL IDPVSHRRTI 

251 PTAQLIGSSN PQHQAELMWR ISLTVSVLLL CLLAVPL SYF NPRSGHTYNI 

301 LIAIGLFLIY QNGLTL LFEA VEDGKIHFWL GLLPMHIIMF AVALILL RVR 

351 SMPSQPFWQA VGKSLTLKGG K* 



Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted ORF from A [meningitidis (strain A) 

ORF101 shows 91.2% identity over a 57aa overlap and 95.7% identity over a 69aa overlap with 
an ORF (ORFlOla) from strain A of N. meningitidis: 

10 20 30 40 50 

orf 101 . pep MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGXVIAIDAVLALVGFWVX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 101a MI YQRNLIKELS FTAVGI FWLLAVLVSTQAINLLGXAADXRX-AI DAVLALVGFWVXXM 

10 20 30 40 50 

// 

90 100 - 110 

orf 101 . pep IAIGLFLIYQNGLTLLFEAVEDGKIHFWLGL 

I I I I I I I I I I I I I I I I I I I I I I I I I II I II 
orf 101a LTVSVLLLCLLAVPLSYFNPRSGHTYNILXAIGLFLIYQNGLTLLFEAVEDGKIHFWLGL 
280 290 300 310 320 330 



120 130 140 150 

orf 101 .pep LPMHI IMFVLALI LLRVRSMPSQPFWQAVGKS LTLKGGKX 
I I I I I I I I I : I : : I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 101a LPMHI IMFVIAIVLLRVRSMPSQPFWQAVGKS LTLKGGKX 

340 350 360 370 

The complete length ORFlOla nucleotide sequence <SEQ ID 501> is: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 



ATGATTTATC 
CATTTTCGTC 
TGCTCGGCCN 
TTGGTCGGCT 
CGCATTTATC 
AAATGTCGGT 
CCGGTGATGC 
GCTTTGGGTG 
TCCTGAAGCA 
AGTTTGGGCA 
CGAATCCGGC 
GCGGCGACAA 
AACAAACGCA 
CGGACGCGCC 
TCAGCACCAC 
CCNACNGCCC 
GATGTGGCGC 
CCGTGCCGCT 
TTGANTGCCA 
TTTTGAAGCC 
CTATGCACAT 
AGCATGCCCA 
GAAAGGCGGA 



AAAGAAACCT 
GTCCTCTTGG 
TGCCGCCGAC 
TCTGGGTCNN 
AGTACGTTGA 
CTGGNTATCC 
AGTTTGCCGT 
ATACCGTGGG 
GAAGCAGGAA 
AGCGCAACGG 
ATCATGAAAA 
CATCATCTTC 
CGCTCGAATT 
GACTACAATC 
GCCCAAACTC 
AACTGATTGG 
ATCTCGCTGA 
TTCCTATTTC 
TCGGTTTGTT 
GTGGAAGACG 
CATCATGTTC 
GCCAGCCCTT 
AAATGA 



CATCAAAGAA 
CGGTATTGGT 
NGGCGTNTCG 
NNGNATGACG 
CCGTGTTGAC 
TGCGGATTGG 
GCCGTTTGCC 
CAGAGCTACG 
TTGTCTTTGG 
CAGGGTTTAT 
ACCTGTTCCT 
NCCAAAGAAA 
GCGCCACGGC 
AGGTTTCCTT 
ATCGACCCCG 
CAGCAGCAAC 
CCGTCAGCGT 
AACCCGCGCA 
TTTAATTTAC 
GCAAAATCCA 
GTCATCGCAA 
CTGGCAGGCG 



CTCTCTTTTA 
CTCCACGCAG 
CCATCGATGC 
CCGCTTTTGC 
CCGCTACTGG 
CATTGAAACA 
GTTTTGGTTG 
CAGCCGCGAA 
TGGAGGCAGG 
TTTGTCGAAA 
GCGCGAACAG 
GTAACTTCTC 
TACCGTTACA 
CCNAAAACTC 
TTTCCCACCG 
CCGCAACATC 
CCTCCTACTC 
GCGGACATAC 
CAAAACGGGC 
TTTTTGGCTC 
TCGTACTTCT 
GTTGGCAAAA 



CCGCCGTCGG 
GCAATCAACC 
CGTGTTGGCA 
TNGTGTTGAC 
CGNGACAGCG 
ATGGATACGC 
CCGTCATGCA 
TACGCTGAAA 
CGGGTTCAAC 
CCTTCGATAC 
GACAAAAACG 
GCTGAACGAC 
GCGGCACGCC 
AACCTGATTA 
CCGTACNATN 
ANGCGGAATT 
TGCCTGCTTG 
CTACAATATC 
TGACCCTGCT 
GGACTGCTGC 
GCGCGTCCGC 
GTCTGACATT 



This encodes a protein having amino acid sequence <SEQ ID 502>: 



1 MIYQRNLIKE LS FTAVGI FV VLLAVLVSTQ A INLLGXAAD XRXAIDAVLA 

51 LVGFWVXXMT PLLL VLTAFI STLTVLTRYW RDSEMSVWXS CGLALKQWIR 

101 PVM QFAVPFA VLVAVMQLWV I PWAELRSRE YAEILKQKQE LSLVEAGGFN 

151 SLGKRNGRVY FVETFDTESG IMKNLFLREQ DKNGGDNIIF XKESNFSLND 

201 NKRTLELRHG YRYSGTPGRA DYNQVSFXKL NLIISTTPKL IDPVSHRRTX 

251 PTAQLIGSSN PQHXAELMWR ISLTVSVLLL CLLAVPL SYF NPRSGHTYNI 

301 LXAIGLFLIY QNGLTL LFEA VEDGKIHFWL GLLPMHIIMF VIAIVLL RVR 

351 SMPSQPFWQA VGKSLTLKGG K* 

ORFlOla and ORF101-1 show 95.4% identity in 371 aa overlap: 



orf 101a . pep MI YQRNLIKELS FTAVGI FWLLAVLVSTQAINLLGXAADXRXAIDAVLALVGFWVXXMT 60 

I I I I I 11 I I I I I I I I I I I I I I I I I I I I I I I I I I I II Ml 1 M I I I I I I I I I I I II 
orf 101-1 MIYQRNLIKELS FTAVGI FWLLAVLVSTQAINLLGRAADGRVAIDAVLALVGFWVIGMT 60 

orf 101a . pep PLLLVLTAFISTLTVLTRYWRDSEMSVWXSCGLALKQWIRPVMQFAVPFAVLVAVMQLW 120 

I | 1 1 I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 
orf 101-1 PLLLVLTAFISTLTVLTRYWRDSEMSVWLSCGLALKQWIRPVMQFAVPFAVLVAVMQLWV 120 
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10 



15 



20 



orf 101a. pep 
orfl01-l 
orf 101a. pep 
orfl01-l 
orf 101a. pep 
orfl01-l 
orf 101a. pep 
orfl01-l 
orf 101a. pep 
orfl01-l 



IPWAELRSREYAEILKQKQELSLVEAGGFNSLGKRNGRVYFVETFDTESGIMKNLFLREQ 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
IPWAELRSREYAEILKQKQELSLVEAGEFNSLGKRNGRVYFVETFDTESGIMKNLFLREQ 180 

DKNGGDNIIFXKESNFSLNDNKRTLELRHGYRYSGTPGRADYNQVSFXKLNLIISTTPKL 240 

I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 
DKNGGDNI I FAKEGNFSLNDNKRTLELRHGYRYSGTPGRADYNQVS FQKLNLI I STTPKL 240 

I DPVSHRRTXPTAQLIGS SNPQHXAELMWRI SLTVS VLLLCLLAVPLS YFNPRSGHTYN I 300 

I I I I I 1 I I I I I I I I I I I I I I I I II t I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I 

I DP VSHRRT I PTAQLIGSSNPQHQAELMWRI SLTVS VLLLCLLAVPLS YFNPRSGHTYNI 300 

LXAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFVIAIVLLRVRSMPSQPFWQA 360 
I I I II I I I Ml IMI II lllll I I II II MM Ml I! M::|::M I I Mill II llll 
LIAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFAVALILLRVRSMPSQPFWQA 360 



VGKSLTLKGGK 
I I I I I I I II I I 
VGKSLTLKGGK 



371 



371 



25 



Homology with a predicted ORF from N. gonorrhoeae 

ORF101 shows 96.5 % identity in 57aa overlap at the N-terminal domain and 95.1% identity in 
61aa overlap at the C-terminal domain, respectively, with a predicted ORF (ORFlOlng) from N. 
gonorrhoeae: 



30 



35 



orf 101. pep 
orflOlng 

orf 101. pep 
orflOlng 
orflOl.pep 
orflOlng 



MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGXVIAIDAVLALVGFWV 57 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I 
MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGRV-AIDAVLALVGFWVIGM 59 

// 

IAIGLFLIYQNGLTLLFEAVEDGKIHFWLG 333 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
SLTVSVLLLCLLAVPLSYFNPRSGHTYNILIAIGLFLIYQNGLTLLFEAVEDGKIHFWLG 331 

LLPMHIIMFVLALILLRVRSMPSQPFWQAVGKSLTLKGGK 373 

I M I I i 1 I I I : I :: I 1 I I I I J I I I 1 I } I I! I 

LLPMHI IMFVIAIVLLRVRSMPSQPFWQAVG 362 



40 



The ORFlOlng nucleotide sequence <SEQ ID 503> is predicted to encode a protein having partial 
amino acid sequence <SEQ ID 504>: 



45 



1 MIYORNLIKE LSFTAVGIFV 

51 LVGFWVIGMT PLLL VLTAFI 

101 PVMQ FAVPFA ILIAVMQLWV 

151 NLGKRNGRVY FVETFDTESG 

201 NKRTLELRHG YRYSGTPGRA 

251 STAQLIGSSN PQHQAELMWR 

301 LIAIGLFLIY QNGLTL LFEA 

351 SMPSQPFWQA VG. . . 



VLLAVLVSTQ 
STLTVLTRYW 
_IPWAELRSRE 
IMKNLFLREQ 
DYNQVSFQKL 
ISLTVSVLLL 



AINLLGRAAD 
RDSEMSVWLS 
YAEILKQKQE 
DKNGGDNI IF 
NLIISTTPKL 
CLLAVPLSYF 



VEDGKIHFWL GLLPMHIIMF 



GRVAIDAVLA 
CGLALKQWIR 
LSLVEAGEFN 
AKEGNFSLKD 
IDPVSHRRTI 
NPRSGHTYNI 
VIAIVLLRVR 



Further work revealed the complete nucleotide sequence <SEQ ID 505>: 



50 



55 



60 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 



ATGATTTATC 
CATTTTCGTC 
TGCTTGGCCG 
TTAGTCGGCT 
CGCATTCATC 
AAATGTCGGT 
CCCGTCATGC 
GCTTTGGGTG 
TTTTGAAGCA 
AACTTGGGCA 
CGaatccgGC 
gcggcgacaA 



AAAGAAACCT 
GTCCTCTTGG 
CGCAGCTGAC 
TCTGGGTCAT 
AGCACGCTGA 
CTGGCTATCC 
AGTTTGCCGT 
ATACCGTGGG 
GAAGCAGGAA 
AGCGCAACGG 
ATCATGAAAA 
CATCATCTTC 



CATCAAAGAA 
CGGTGTTGGT 
GGGCGTGTCG 
CGGTATGACC 
CCGTATTGAC 
TGCGGATTGG 
GCCGTTTGCC 
CAGAGCTGCG 
TTGTCTTTGG 
CAgggtttaT 
ACCTGTtCCt 
GCcaaaGAag 



CTCTCTTTTA 
GTCCACGCAG 
CCATCGATGC 
CCGCTTTTGC 
CCGCTACTGG 
CGTTGAAACA 
ATCCTGATTG 
CAGCCGCGAA 
TGGAAGCCGG 
TtcgtcgaaA 
GcGCGAACAG 
gtaactTctc 



CCGCCGTCGG 
GCGATCAACC 
CGTGTTGGCC 
TGGTGTTGAC 
CGCGACAGCG 
GTGGATACGC 
CCGTGATGCA 
TATGCCGAAA 
CGAGTTCAAT 
CCTTTGACAC 
GACAAAAACG 
gctgaaggaC 
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10 



601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 



AACAAAcgca 
CGGacGCGCc 
TCAGCACCAC 
tcgacCGCCC 
GATGTGGCGC 
CCGTGCCGCT 
TTGATTGCCA 
TTTTGAAGCC 
CTATGCACAT 
AGTATGCCCA 
GAAAGgcgGA 



cgctcgaATT 
gactaCAATC 
GCCCAAacTT 
AAcTGATTGG 
ATCTCGCTGA 
TTCCTATTTC 
TCGGTTTGTT 
GTGGAAGACG 
CATCATGTTC 
GCCAGCCCTT 
AAATGA 



GCGCCACGGC 
AGGTTtcctt 
ATCGaccCCG 
CAGCAGCAAT 
CCGTCAGCGT 
AACCCGCGCA 
TTTAATTTAC 
GCAAAATCCA 
GTCATCGCAA 
CTGGCAGGCG 



TACCGTTACA 
cCAAAAacTc 
TTTCCCACCG 
CCGCAACATC 
CCTCCTGCTC 
GCGGACATAC 
CAAAACGGGC 
TTTTTGGCTC 
TCGTACTTCT 
GTTGGCAAAA 



GCGGcacgcC 
aacctgATta 
CCGCACCATT 
AGGCAGAATT 
TGCCTACTCG 
CTACAATATC 
TGACCCTGCT 
GGACTGCTGC 
GCGCGTCCGC 
GTCTGACATT 



This corresponds to the amino acid sequence <SEQ ID 506; ORF101ng-l>: 



15 



20 



l 

51 
101 
151 
201 
251 
301 



MIYQRNLIKE LSFTAVGIFV VLLAVLVSTQ 



LVGFWVIGMT PLLL VLTAFI 
PVMQ FAVPFA ILIAVMQLWV 
NLGKRNGRVY FVETFDTESG 
NKRTLELRHG YRYSGTPGRA 
STAQLIGSSN PQHQAELMWR 
LIAIGLFLIY QNGLTLLFEA 



STLTVLTRYW 
IPWAELRSRE 
IMKNLFLREQ 
DYNQVSFQKL 
ISLTVSVLLL 



AINLLGRAAD 
RDSEMSVWLS 
YAEILKQKQE 
DKNGGDNIIF 
NLIISTTPKL 
CLLAVPLSYF 



351 SMPSQPFWQA VGKSLTLKGG 



VEDGKIHFWL 
K* 



GLLPMHIIMF 



GRVAIDAVLA 
CGLALKQWIR 
LSLVEAGEFN 
AKEGNFSLKD 
IDPVSHRRTI 
NPRSGHTYNI 
VIAIVLLRVR 



ORFlOlng-1 and ORF101-1 show 97.6% identity in 371 aa overlap: 



25 



30 



35 



40 



45 



50 



55 



60 



orf 101-1 .pep 
orfl01ng-l 

orf 101-1 .pep 
orf 101ng-l 



orf 101-1. pep 
orfl01ng-l 

orf 101-1. pep 
orfl01ng-l 

orf 101-1 .pep 
orfl01ng-l 

orf 101-1 .pep 
orf 101ng-l 

orf 101-1. pep 
orf 101ng-l 



10 20 30 40 50 60 

MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGRVAIDAVLiALVGFWVIGMT 
I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I II I I I I I 
MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGRVAIDAVLALVGFWVIGMT 

10 20 30 40 50 60 

70 80 90 100 110 120 

PLLLVLTAFISTLTVLTRYWRDSEMSVWLSCGLALKQWIRPVMQFAVPFAVLVAVMQLWV 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I 1 I II I I II I I I : I : I I I I I I I 
PLLLVLTAFI STLTVLTRYWRDSEMSVWLSCGLALKQWIRPVMQFAVPFAILIAVMQLWV 

70 80 90 100 110 120 

130 140 150 160 170 180 

I PWAELRSREYAE I LKQKQELSLVE AGE FNSLGKRNGRVYFVETFDTESG IMKNLFLREQ 
I I I I I I I I I I I I I I I I I I I I 1 Tl I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
I PWAELRSREYAE I LKQKQELSLVEAGEFNNLGKRNGRVYFVETFDTESGIMKNLFLREQ 

130 140 150 160 170 180 

190 200 210 220 230 240 

DKNGGDNI I FAKEGNFSLNDNKRTLELRHGYRYSGTPGRADYNQVS FQKLNLIISTTPKL 
I I I I I I I I II I I I II I I I : I I II II I II I I I I I I I I I I I I I I I I I I I I || I I I I I I I I I I 
DKNGGDNI I FAKEGNFSLKDNKRTLELRHGYRYSGT PGRADYNQVS FQKLNLI I STTPKL 

190 200 210 220 230 240 

250 260 270 280 290 300 

IDPVSHRRTIPTAQLIGSSNPQHQAELMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYNI 
I I I I I M I I I I I I I I II I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I i I I I 
IDPVSHRRTISTAQLIGSSNPQHQAELMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYNI 

250 260 270 280 290 300 

310 320 330 340 350 360 

LIAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFAVALILLRVRSMPSQPFWQA 
I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I M I I I I I :: I :: I I I I I I I I I 1 1 I I I I 
LIAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFVIAIVLLRVRSMPSQPFWQA 

310 320 330 340 350 360 

370 

VGKSLTLKGGKX 
I I I I I I I I I I I I 
VGKSLTLKGGKX 
370 



Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
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predicted that the proteins from meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 60 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 507>: 

5 1 . . GGTGGTGGTT TTATCAATGC TTCCTGTGCC ACTTTGACGA CAGCCAAACC 

51 GCAATATCAA GCAGGAGACC TTAGCGCTTT TAAGATAAGG CAAGGCAATG 

101 TTGTAATCGC CGGACACGGT TTGGATGCAC GTGATACCGA TTACACACGT 

151 ATTCTCAGTT ATCATTCCAA AATCGATGCA CCCGTATGGG GACAAGATGT 

201 TCGTGTCGTC GCGGGACAAA ACGATGTGGC CGCAACAGGT GATGCACATT 

10 251 CGCCTATTCT CAATAATGCT GCTGCCAATA CGTCAAACAA TACAGCCAAC 

301 AACGGCACAC ATATCCCTTT ATTTGCGATT GATACAGGCA AATTAGGAGG 

351 TAT . GTATGC CAACAAAATC ACCTTGATCA GTACGGTCGA GCAAGCAGGC 

401 ATTCGTAA 

This corresponds to the amino acid sequence <SEQ ID 508; ORF1 13>: 

15 1 ..GGGFINASCA TLTTAKPQYQ AGDLSAFKIR QGNWIAGHG LDARDTDYTR 

51 ILSYHSKIDA PVWGQDVRW AGQNDVAATG DAHSPILNNA AANTSNNTAN 
101 NGTHIPLFAI DTGKLGGXVC QQNHLDQYGR ASRHS* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with with pspA putative secreted protein of N. meningitidis (accession AF030941) 
20 ORF and pspA show 44% aa identity in 179aa overlap: 

orfll3 GGGFINASCATLTTAKPQYQAGDLSAFKIRQGNWIAGHGLDARDTDYTRILSYHSKIDA 60 

GGG INA+ TLT+ P G+L+ F + G WI G GLD D DYTRILS ++I+A 
pspa GGGLINAASVTLTSGVPVLNNGNLTGFDVSSGKWIGGKGLDTSDADYTRILSRAAEINA 256 

25 orfll3 PVWGQDVRWAGQN DVAATG D AH S PI LXXXXXXXXXXXXXXGT H I PL FAI DTGKLGGM YA 120 

VWG+DV+VV+G+N + G + P AIDT LGGMYA 

pspa GVWGKDVKVVSGKNKLDFDG SLAKTASAPSSSDSVTPTVAIDTATLGGMYA 307 

orfll3 NKITLISTVEQAGIRNQGQWFASAGNVAVNAEGKLVNTGMIAATGENHAVSLHARNVHN 179 
30 +KITLIST A IRN+G+ FA+ G V ++A+GKL N+G I A +++ A+ V N 

pspa DKITLISTDNGAVIRNKGRIFAATGGVTLSADGKLSNSGSIDAA EITISAQTVDN 362 

Homology with a predicted ORF from Kzonorrhoeae 

ORF1 13 shows 86.5% identity in 52aa overlap at the N- terminal part and 94.1% identity in 17aa 
35 overlap at the C-terminal part with a predicted ORF (ORF1 13ng) from N. gonorrhoeae: 

orfll3 GGGFINASCATLTTAKPQYQAGDLSAFKIR 30 

I I I I I I I I I I I I I :: I I I I I I I : I : I I I I 
orfll3ng SHPSQLNGYIEVGGRRAEWIANPAGIAVNGGGFINASRATLTTGQPQYQAGDFSGFKIR 224 

40 orfll3 QGNWIAGHGLDARDTDYTRILSYHSKIDAPVWGQDVRWAGQNDVAATGDAHSPILNNA 90 

I I I : I I I I I I I I I I I I I : I I I I 
or f 1 1 3ng QGNAVIAGHGLDARDTDFTRI LVCQQNHLDQYGRTSRHS 263 

or f 1 1 3 I DTGKLGGXVCQQNHLDQYGRASRHS 135 

45 I I I I I I I t I I I 1 : I I I I 

orfll3ng DFSGFKIRQGNAVIAGHGLDARDTDFTRI LVCQQNHLDQYGRTSRHS 263 

The complete length ORF113ng nucleotide sequence <SEQ ID 509> is predicted to encode a 
protein having amino acid sequence <SEQ ID 510>: 

50 1 MNKTLYRVIF NRKRGAWAV AETTKREGKS CADSGSGSVY VKSVSFIPTH 

51 SKAFCFSALG FSLCLALGTV NIAFADGI1T DKAAPKTQQA TILQTGNGIP 
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101 QVNIQTPTSA GVSVNQYAQF DVGNRGAILN NSRSNTQTQL GGWIQGNPWL 
151 TRGEARVWN QINSSHPSQL NGYIEVGGRR AEWIANPAG IAVNGGGFIN 
201 ASRATLTTGQ PQYQAGDFSG FKIRQGNAVI AGHGLDARDT DFTRILVCQQ 
251 NHLDQYGRTS RHS* 

Based on this analysis, it is predicted that these proteins from N.meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 61 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 5 1 1>: 

1 . . TCAACGGGAC ATAGCGAACA AAATTACACT TTGCCGCGAG AAATCACACG 

51 CAACATTTCA CTGGGTTCAT TTGCCTATGA ATCGCATCGC AAAGCATTAA 

101 GCCATCATGC GCCCAGCCAA GGCACTGAGT TGCCGCAAAG CAACGGTATT 

151 TCGCTACCCT ATACGTCCAA TTCTTTTACC CCATTACCCA GCAGCAGCTT 

201 ATACATTATC AATCCTGTCA ATAAAGGCTA TCTTGTTGAA ACCGATCCAC 

251 GCTTTGCCAA CTACCGTCAA TGGTTGGGTA GTGACTATAT GCtGGACAGC 

301 CTCAAACTAG ACCCAAACAA TTTACATAAA CGTTTGGGTG ATGGTTATTA 

351 CGAGCAACGT TTAATCAATG AACAAATCGC AGAGCTGACA GGGCATCGTC 

401 GTTTAGAcGG TTATCAAAAC GACGAAGAAC AATTTAAAGC CTTAATGGAT 

451 AATGGCGCGA CTGCGGCACG TTcGATGAAT CTCAGCGTTG GCATTGCATT 

501 AAGTGCCGAG CAAGTAGCGC AACTGACCAG CGATATTGTT TGGTTGGTAC 

551 AAAAAGAAGT TAAGCTTCCT GATGGCGGCA CACAAACCGT ATTGGTGCCA 

601 CAGGTTTATG TACGCGTTAA AAATGGCGAC ATAGACGGTA AAGGTGCATT 

651 GTTGTCAGGC AGCAATACAC AAATCAATGT TTCAGGCAGC CTGAAAAACT 

701 CAGGCACGAT TGCAGGgCGC AATGCGCTTA TTATCAATAC CGATACGCTA 

751 GACAATATCG GTGGGCGTAT TCATGCGCAA AAATCAGCGG TTACGGCCAC 

801 ACAAGACATC AATAATATTG GCGGCATGCT TTCTGCCGAA CAGACATTAT 

851 TGCTCAACGC AGGCAACAAC ATCAACAGCC AAAGCACCAC CGCCAGCAGT 

901 CAAAATACAC AAGGCAGCAG CACCTACCTA GACCGAATGG CAGGTATTTA 

951 TATCACAGGC AAAGAAAAAG GTGTTT. . 

This corresponds to the amino acid sequence <SEQ ED 512; ORF1 15>: 

1 . . STGHSEQNYT LPREITRNIS LGSFAYESHR KALSHHAPSQ GTELPQSNGI 

51 SLPYTSNSFT PLPSSSLYII NPVNKGYLVE TDPRFANYRQ WLGSDYMLDS 

101 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

151 NGATAARSMN LSVGIALSAE QVAQLTSDIV WLVQKEVKLP DGGTQTVLVP 

201 QVYVRVKNGD IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

251 DNIGGRIHAQ KSAVTATQDI NNIGGMLSAE QTLLLNAGNN INSQSTTASS 

301 QNTQGSSTYL DRMAGIYITG KEKGV. . 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the pspA putative secreted protein of N. meningitidis (accession number AF030941) 
ORF1 15 and pspA protein show 50% aa identity in 325aa overlap: 

Orfll5: 1 STGHSEQNYTLPREITRNISLGSFAYESHRKALSHHAPSQGTELPQSNGI SLPYTSNSFT 60 

STG+S Y E++ +1 +G AY+ + + P + NGI +T 
pspA: 778 STGYSRSPYEPAPEVS-SIRMGISAYKGYAPQQASDIPGTWPWAENGIHPTFT 831 

Orfll5: 61 PLPSSSLYIINPVNKGYLVETDPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQR 120 

LP+SSL+ I P NKGYL+ETDP F +YR+WLGS YML +L+ DPN++HKRLGDGYYEQ+ 
pspA: 832 -LPNSSLFAIAPNNKGYLIETDPAFTDYRKWLGSGYMLAALQQDPNHIHKRLGDGYYEQK 890 

Orfll5: 121 LINEQIAELTGHRRLDGYQNDEEQFKALMDNGATAARSMNLSVGIALSAEQVAQLTSDIV 180 

L+NEQIA+LTG+RRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQVA+LTSDIV 
pspA: 891 LVNEQIAKLTGYRRLDGYTNDEEQFKALMDNGITIAKELQLTPGIALSAEQVARLTSDIV 950 

Orfll5: 181 WLVQKEVKLPDGGTQTVLVPQVYVRVKNGDIDGKGALLSGSNTQINVSGSLKN-SGTIAG 239 

WL + V LPDG TQTVL P+VYVR + D++G+GALLSGS I SG+++N G IAG 
pspA: 951 WLENETVTLPDGTTQTVLKPKVYVRARPKDMNGQGALLSGSWDIG-SGAIENRGGLIAG 1009 
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Orfll5: 240 RN AL I INT DT L DN I GGR I HAQKS AVT ATQD I NN I GGML S AE QT L LLNAGXXXXXXXXXXX 299 

R ALI+N +N+G++ ADING+AE LLL A 

pspA: 1010 REALILNAQNIKNLQGDLQGKNIFAAAGSDITNTGS-IGAENALLLKASNNIESRSETRS 1068 



XXXXXXXXXYLDRMAGI Y ITGKEKG 324 
+ R+AGIY+TG++ G 
1069 NQNEQGSVRN I GRVAG I YLTGRQNG 1093 



10 



OtfllS: 300 
pspA: 

Homology with a predicted ORF from N. gonorrhoeae 

ORF115 shows 91.9% identity over a 334aa overlap with a predicted ORF (ORFllSng) from 



N. gonorrhoeae: 



15 



20 



25 



30 



35 



orf 115. pep 
orf 115ng 
orf 115. pep 
orf 115ng 
orf 115. pep 
orf 115ng 
orf 115. pep 
orf 115ng 
orf 115. pep 
orf 115ng 
orf 115. pep 
orf 115ng 
orf 115. pep 
orf!15ng 



STGHSEQNYTLPREITRNISLGSFAYESHRK 
III I I I I I I I : I I I I : I I I I I I I I I I I I 
NEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDISLGSFAYESHSK 



31 



71 



81 



ALSHHAPSQGTELPQSN GISLPYTSNSFTPLPSSSLYIINPVNKGYLVET 

I I | : | I I I I I I II I I I I I I 1 I I II I I I I I I I : I I I I I I II : I II I I I I I 

ALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYIINPANKGYLVET 131 

DPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 141 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I II I I 

DPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 191 

EEQFKALMDNGATAARSMNLSVGIALSAEQVAQLTSDIVWLVQKEVKLPDGGTQTVLVPQ 201 

I | | | | | I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I II I I I I I I I I I I I I i I I I |: I I 

EEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLPDGGTQTVLMPQ 251 

VYVRVKNGDIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 2 61 

I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I II 1 I I I I I I I I I I I I I I I I I I I I I M 

VYVRVKNGGIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 311 

SAVTATQDINNIGGMLSAEQTLLLNAGNNINSQSTTASSQNTQGSSTYLDRMAGIYITGK 321 

I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I : I I I : I I I I : I I I I I I ! II I I I I I I I II 

S AVT ATQDINN IGGI LSAEQTLLLNAGNNINNQSTAKS SQNAQGS STYLDRMAGI YITGK 371 

EKGV 325 
I I II 

EKGVLAAQAGKDIN 1 1 AGQI SNQS DQGQTRLQAGRD INLDTVQTGKYQE IHFDADNHT I R 431 



An ORF1 1 5ng nucleotide sequence <SEQ ID 5 13> was predicted to encode a protein having amino 



40 acid sequence <SEQ ID 5 1 4>: 



45 



50 



55 



1 MLVQTEKDGL 

51 LPEEITRDIS 

101 SLPYTPNSFT 

151 LKLDPNNLHK 

201 NGATAARSMN 

251 QVYVRVKNGG 

301 DNIGGRIHAQ 

351 QNAQGSSTYL 

401 RLQAGRDINL 

451 SGNNLNAKAA 

501 GNKLVITDKA 

551 QAGNHVRIGT 

601 NEHTGSTVGS 

651 NQLNSKTTQT 

701 MPWRLPMQVG 



HNEQTFGEKK 
LGSFAYESHS 
PLPGSSLYII 
RLGDGYYEQR 
LSVGIALSAE 
IDGKGALLSG 
KS AVT ATQD I 
DRMAGIYITG 
DTVQTGKYQE 
EVGSAKGTLA 
QSHHETAQSS 
TQTQSQSETY 
LKGDTTIVAS 
YEQKGLTVAF 
RLFKQAKAPK 



VFSENGKLHN 
KALSRHAPSQ 
NPANKGYLVE 
LINEQIAELT 
QAAQLTSDIV 
SNTQINVSGS 
NNIGGILSAE 
KEKGVLAAQA 
IHFDADNHT I 
VYAKNDITIS 
TFEGKQWLQ 
HQTQKSGLMS 
KHYEQTGSNV 
SSPVTDLAQQ 
K* 



YWRARRKGHD 
GTELPQSNRD 
TDPRFANYRQ 
GHRRLDGYQN 
WLVQKEVKLP 
LKNSGTIAGR 
QTLLLNAGNN 
GKDINIIAGQ 
RGSTNEVGSS 
SGIHAGQVDD 
AGNDANILGS 
AGIGFTIGSK 
SSPEGNNLIS 
AIAVAHKAAK 



ETGHREQNYT 
NIRTAKSNGI 
WLGSDYMLGS 
DEEQFKALMD 
DGGTQTVLMP 
NALIINTDTL 
INNQSTAKSS 
ISNQSDQGQT 
IQTKGDVTLL 
ASKHTGRSGG 
NVISDNGTRI 
TNTQENQSQS 
TQSMDIGAAQ 
QFDKAKTTAL 



Further work revealed the following partial gonococcal DNA sequence <SEQ ED 515>: 



1 TTGCTTGTGC AAACAGAAAA AGACGGTTTG CATAACGAGC AAACCTTTGG 

51 CGAGAAGAAA GTCTTCAGCG AAAATGGTAA GTTGCACAAC TACTGGCGTG 

101 CGCGTCGTAA AGGACATGAT GAAACAGGGC ATCGTGAACA AAATTATACT 

60 151 TTGCCGGAGG AAATCACACG CGACATTTCA CTGGGTTCAT TTGCCTATGA 

201 ATCGCATAGC AAAGCATTAA GCCGTCATGC GCCCAGCCAA GGCACTGAGT 

251 TGCCACAAAG TAACCGGGAT AATATCCGTA CTGCGAAAAG CAACGGTATT 
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301 TCGCTACCCT ATACGCCCAA TTCTTTTACC CCATTACCCG GCAGCAGCTT 

351 ATACATTATC AATCCTGCCA ATAAAGGCTA TCTTGTTGAA ACCGATCCAC 

401 GCTTTGCCAA CTACCGTCAA TGGTTGGGTA GTGACTATAT GCTGGGCAGC 

451 CTCAAACTAG ACCCAAACAA TTTACATAAA CGTTTGGGTG ATGGTTATTA 

501 CGAGCAACGT TTAATCAATG AACAAATCGC AGAGCTGACA GGG CAT CGTC 

551 GTTTAGACGG TTATCAAAAC GACGAAGAAC AATTTAAAGC CTTAATGGAT 

601 AATGGCGCGA CTGCGGCACG TTCGATGAAT CTCAGCGTTG GCATTGCATT 

651 AAGTGCCGAG CAAGCAGCGC AACTGACCAG CGATATTGTT TGGTTGGTAC 

701 AAAAAGAAGT TAAACTTCCT GATGGCGGCA CACAAACCGT ATTGATGCCA 

751 CAGGTTTATG TACGCGTTAA AAATGGCGGC ATAGACGGTA AAGGTGCATT 

801 GTTGTCAGGC AGCAATACAC AAATCAATGT TTCAGGCAGC CTGAAAAACT 

851 CAGGCACGAT TGCAGGGCGC AATGCGCTTA TTATCAATAC CGATACGCTA 

901 GACAATATCG GTGGGCGTAT TCATGCGCAA AAATCAGCGG TTACGGCCAC 

951 ACAAGACATC AATAATATTG GCGGCATTCT TTCTGCCGAA CAGACATTAT 

1001 TGCTCAATGC GGGTAACAAC ATCAACAACC AAAGCACGGC CAAGAGCAGT 

1051 CAAAATGCAC AAGGTAGCAG CACCTACCTA GACCGAATGG CAGGTATTTA 

1101 TATCACAGGC AAAGAAAAAG GTGTTTTAGC AGCGCAGGCA GGCAAAGACA 

1151 TCAACATCAT TGCCGGTCAA ATCAGCAATC AATCAGATCA AGGGCAAACC 

1201 CGGCTGCAGG CAGGACGCGA CATTAACCTG GATACGGTAC AAACCGGCAA 

1251 ATATCAAGAA ATCCATTTTG ATGCCGATAA CCATACCATC CGAGGTTCAA 

1301 CGAACGAAGT CGGCAGCAGC ATTCAAACAA AAGGCGATGT TACCCtatTG 

1351 TCAGGGAATA ATCTCAATGC CAAAGCTGCC GAAGTCGGCA GCGCAAAAGG 

1401 CACACTTGCC GTGTATGCTA AAAATGACAT TACTATCAGC TCAGGCATCC 

1451 ATGCCGGCCA AGTTGATGAT GCGTCCAAAC ATACAGGCAG AAGCGGCGGC 

1501 GGTAATAAAT TAGTCATTAC CGATAAAGCC CAAAGTCATC ACGAAACTGC 

1551 TCAAAGCAGC ACCTTTGAAG GCAAGCAAGT TGTATTGCAG GCAGGAAACG 

1601 ATGCCAACAT CCTTGGCAGT AATGTTATTT CCGATAATGG CACCCGGATT 

1651 CAAGCAGGCA ATCATGTTCG CATTGGTACA ACCCAAACTC AAAGCCAAAG 

1701 CGAAACCTAT CATCAAACCC AAAAATCAGG ATTGATGAGT GCAGGTATCG 

1751 GCTTCACTAT TGGCAGCAAG ACAAACACAC AAGAAAACCA ATCCCAAAGC 

1801 AACGAACATA CAGGCAGTAC CGTAGGCAGC CTGAAAGGCG ATACCACCAT 

1851 TGTTGCAAGC AAACACTACG AACAAACCGG CAGCAACGTT TCCAGCCCTG 

1901 AGGGCAACAA CCTTATCAGC ACGCAAAGTA TGGATATTGG CGCAGCACAA 

1951 AACCAATTAA ACAGCAAAAC CACCCAAACC TACGAACAAA AAGGCTTAAC 

2001 GGTGGCATTC AGTTCGCCCG TTACCGATTT GGCACAACAA GCGATTGCCG 

2051 TAGCACACAA AGCAGCAAAC AAGTCGGACA AAGCAAAAAC GACCGCGTTA 

2101 ATGCCATGGC GGCTGCCAAT GCAGGTTGGC AGGCCTATCA AACAGGCAAA 

2151 GGCGCACAAA ACTTAG 

This corresponds to the amino acid sequence <SEQ ID 516; ORF1 15ng-l>: 

1 LLVQTEKDGL HNEQTFGEKK VFSENGKLHN YWRARRKGHD ETGHREQNYT 

51 LPEEITRDIS LGSFAYESHS KALSRHAPSQ GTELPQSNRD NIRTAKSNGI 

101 SLPYTPNSFT PLPGSSLYII NPANKGYLVE TDPRFANYRQ WLGSDYMLGS 

151 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

201 NGATAARSMN LSVGIALSAE QAAQLTSDIV WLVQKEVKLP DGGTQTVLMP 

251 QVYVRVKNGG IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

301 DNIGGRIHAQ KSAVTATQDI NNIGGILSAE QTLLLNAGNN INNQSTAKSS 

351 QNAQGSSTYL DRMAGIYITG KEKGVLAAQA GKDINIIAGQ ISNQSDQGQT 

401 RLQAGRDINL DTVQTGKYQE IHFDADNHTI RGSTNEVGSS IQTKGDVTLL 

451 SGNNLNAKAA EVGSAKGTLA VYAKNDITIS SG I HAGQVDD ASKHTGRSGG 

501 GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTRI 

551 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

601 NEHTGSTVGS LKGDTTIVAS KHYEQTGSNV SSPEGNNLIS TQSMDIGAAQ 

651 NQLNSKTTQT YEQKGLTVAF SSPVTDLAQQ AIAVAHKAAN KSDKAKTTAL 

701 MPWRLPMQVG RPIKQAKAHK T* 

This gonococcal protein (ORF115ng-l) shows 91.9% identity with ORF115 over 334aa: 

20 30 40 50 60 70 

orf 115ng-l . p NEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDISLGSFAYESHSK 

Ml I I I I I I I : I I I I : I I I I I I I M I I I 
orf 115 STGHSEQNYTLPREITRNISLGSFAYESHRK 

10 20 30 

80 90 100 110 120 130 

orf 115ng-l . p ALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYIINPANKGYLVET 
I I I : I I I I I I ! I I I I I I lllilll I I t I I t I: I I I I I I I I: I I I II I I I 

orf 115 ALSHHAPSQGTELPQ5N GISLPYTSNSFTPLPSSSLYIINPVNKGYLVET 

40 50 60 70 80 



WO 99/24578 



-303- 



PCT/IB98/01665 



140 150 160 170 180 190 

orfll5ng-l.p DPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 
I I I I I I I I ! I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 115 DPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 

90 100 110 120 130 140 



10 



200 210 220 230 240 250 

orf 115ng-l . p EEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIWLVQKEVKLPDGGTQTVLMPQ 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I : I I 
orf 115 EEQFKALMDNGATAARSMNLSVGIALSAEQVAQLTSDIVWLVQKEVKLPDGGTQTVLVPQ 

150 160 170 180 190 200 



15 



260 270 280 290 300 310 

orfll5ng-l.p VYVRVKNGGIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 
I! mm: M M I I I I I I I I I I I I I I I I ! I I I II II I I I I I I I I I I I I I I i I I I I I M I 
orf 115 VYVRVKNGDIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 

210 220 230 240 250 260 



320 330 340 350 360 370 

20 orf 115ng-l .p SAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTYLDRMAGIYITGK 

I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I : I I I : I I I I : I I I I I I I I I I I I I I I I I I 
orf 115 SAVTATQDINNIGGMLSAEQTLLLNAGNNINSQSTTASSQNTQGSSTYLDRMAGIYITGK 
270 280 290 300 .310 320 

25 380 390 400 410 420 430 

orf 115ng-l . p EKGVLAAQAGKDINIIAGQISNQSDQGQTRLQAGRDINLDTVQTGKYQEIHFDADNHTIR 
I I I I 

orfllS EKGV 

In addition, it shows homology with a secreted N. meningitidis protein in the database: 

30 gi 12623258 (AF030941) putative secreted protein [Neisseria meningitidis] Length 

= 2273 

Score = 604 bits (1541), Expect = e-172 

Identities - 325/678 (47%) , Positives = 449/678 (65%), Gaps = 22/678 (3%) 

35 Query: 1 LLVQTEKDGLHNEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDIS 60 

L+V T + L N++T G K + ++ G LH Y R +KG D TG+ Y E++ I 
Sbjct: 739 LIVGTPESALDNDETLGTKTI-TDKGDLHRYHRHHKKGRDSTGYSRSPYEPAPEVS-SIR 796 

Query: 61 LGS FAYESHSKALSRHAPSQGTELPQSNRDNIRTAKSNGI SLPYT PNS FTPLPGSSLYI I 120 
40 +G AY+ + AP Q +++P + + NGI +T LP SSL+ I 

Sbjct: 7 97 MGISAYKGY APQQASDI PGTV VPWAENGIHPTFT LPNSSLFAI 840 

Query: 121 NPANKGYLVETDPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELT 180 
P NKGYL+ETDP F +YR+WLGS YML +L+ DPN++HKRLGDGYYEQ+L+NEQIA+LT 
45 Sbjct: 841 APNNKGYLIETDPAFTDYRKWLGSGYMLAALQQDPNHIHKRLGDGYYEQKLVNEQIAKLT 900 



50 



55 



Query: 181 GHRRLDGYQNDEEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLP 240 

G+RRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQ A+LTSDIVWL + V LP 
Sbjct: 901 GYRRLDGYTNDEEQFKALMDNGITIAKELQLTPGIALSAEQVARLTSDIVWLENETVTLP 960 

Query: 241 DGGTQTVLMPQVYVRVKNGGIDGKGALLSGSNTQINVSGSLKN-SGTIAGRNALIINTDT 299 

DG TQTVL P+VYVR + ++G+GALLSGS I SG+++N G IAGR ALI+N 
Sbjct: 961 DGTTQTVLKPKVYVRARPKDMNGQGALLSGSWDIG-SGAIENRGGLIAGREALILNAQN 1019 

Query: 300 LDNIGGRIHAQKSAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTY 359 

+ N+G++ ADINGIAE LLL A NNI ++S +S+QN QGS 

Sbjct: 1020 IKNLQGDLQGKNIFAAAGSDITNTGSI-GAENALLLKASNNIESRSETRSNQNEQGSVRN 1078 



60 



Query: 360 LDRMAGIYITGKEKGVLAAQAGKDINIIAGQISNQSDQGQTRLQAGRDINLDTVQTGKYQ 419 

+ R+AGIY+TG++ G + AG +1 + A +++NQS+ GQT L AG DI DT + Q 
Sbjct: 1079 IGRVAGIYLTGRQNGSVLLDAGNNIVLTASELTNQSEDGQTVLNAGGDIRSDTTGISRNQ 1138 



65 



70 



Query: 420 EIHFDADNHTIRGSTNEVGSSIQTKGDVTLLSGNNLNAKAAEVGSAKGTLAVYAKNDITI 479 

FD+DN+ IR NEVGS+I+T+G+++L + ++ +AAEVGS +G L + A DI + 
Sbjct: 1139 NTIFDSDNYVIRKEQNEVGSTIRTRGNLSLNAKGDIRIRAAEVGSEQGRLKLAAGRDIKV 1198 

Query: 480 SSGIHAGQVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANILG 539 

+G + +DA K+TGRSGGG K +T ++ + A S T +GK+++L +G D + G 
Sbjct: 1199 EAGKAHTETEDALKYTGRSGGGIKQKMTRHLKNQNGQAVSGTLDGKEIILVSGRDITVTG 1258 
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Query: 540 SNVISDNGTRIQAGNHVRIGTTQTQSQSETYHQTQKSGLM-SAGIGFTIGSKTNTQENQS 598 

SN+I+DN T + A N++ + +T+S+S ++ +KSGLM S GIGFT GSK +TQ N+S 
Sbjct: 1259 SNIIADNHTILSAKNNIVLKAAETRSRSAEMNKKEKSGLMGSGGIGFTAGSKKDTQTNRS 1318 

Query: 599 QSNEHTGSTVGSLKGDTTIVASKHYEQTGSNVSSPEGNNLISTQSMDIGAAQNQLNSKTT 658 

++ HT S VGSL G+T I A KHY QTGS +SSP+G+ IS+ + I AAQN+ + ++ 
Sbjct: 1319 ETVSHTESWGSLNGNTLISAGKHYTQTGSTISSPQGDVGISSGKISIDAAQNRYSQESK 1378 

Query: 659 QTYEQKGLTVAFSSPVTD 676 

Q YEQKG+TVA S PV + 
Sbjct: 1379 QVYEQKGVTVAI SVPWN 1396 

Based on this analysis, it is predicted that the proteins from N.meningitidis and K gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 62 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 517>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



. TCAGGGAATA 
TACACTCGCT 
ACACGACCCA 
GGCAATAAAT 
CCAAAGCAGC 
ATGCCAACAT 
CAAGCAGGCA 
CGAAACCTAT 
GCTTCACTAT 
AACGAACATA 
TGTTGCAGGC 
AAGGCAACAA 
AACAAATTAA 
GGTGGCATTC 



ACCTCAATGC 
GTGTCTGCCA 
TGTTGATGAT 
TAGTCATTAC 
ACCTTTGAAG 
CCTTGGCAGC 
ATCATGTTCG 
CATCAAACCC 
TGGCAGCAAG 
CAGGCAGTAC 
AAACACTACG 
TACCATCTAT 
ACAGTAATAC 
AGTTCGCCCG 



CAAAGCTGCC 
ATAATGACAT 
GCGTCCAAAC 
CGATAAAGCC 
GCAAGCAAGT 
AATGTTATTT 
CATTGGTACA 
AGAAATCAGG 
ACAAACACAC 
CGTAGGCAGC 
AACAAATCGG 
GCCCAAAGCA 
CACCCAAACC 
TTACCGATTT 



GAAGTCAGCA 
CAACATCAGC 
ACACAGGCAG 
CAAAGTCATC 
TGTATTGCAG 
CCGATAATGG 
ACCCAAACTC 
ATTGATGAGT 
AAGAAAACCA 
TTGAAAGGCG 
CAGTACCGTT 
TAGACATTCA 
TATGAACAAA 
GGCACAACAA 



GCGCAAACGG 
GCAGGCATCA 
AAGCGGTGGT 
ACGAAACCGC 
GCAGGAAACG 
CACCCAGATT 
AAAGCCAAAG 
GCAGGTATCG 
ATCCCAAAGC 
ATACCACCAT 
TCCAGCCCGG 
AGCGGCACAC 
AAGG.CTAAC 



This corresponds to the amino acid sequence <SEQ ID 518; ORF1 17>; 

1 . . SGNNLNAKAA EVSSANGTLA VSANNDINIS AGINTTHVDD ASKHTGRSGG 

51 GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTQI 

101 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

151 NEHTGSTVGS LKGDTTIVAG KHYEQIGSTV SSPEGNNTIY AQSIDIQAAH 

201 NKLNSNTTQT YEQKXLTVAF SSPVTDLAQQ . . . 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the pspA putative secreted protein of N. meningitidis (accession number AF030941) 
ORF1 17 and pspA protein show 45% aa identity in 224aa overlap: 

Orfll7: 4 NLNAKAAEVSSANGTLAVSANNDINISAGINTTHVDDASKHTGRSGGGNKLVITDKAQSH 63 

++ +AAEV S G L ++A DI + AG T +DA K+TGRSGGG K +T ++ 
pspA: 1173 DIRIRAAEVGSEQGRLKLAAGRDIKVEAGKAHTETEDALKYTGRSGGGIKQKMTRHLKNQ 1232 

Orfll7: 64 HETAQSSTFEGKQWLQAGNDANILGSNVISDNGTQIQAGNHVRIGTTQTQSQSETYHQT 123 

+ AST +GK+++L +G D + GSN+I+DN T + A N++ + +T+S+S ++ 
pspA: 1233 NGQAVSGTLDGKEIILVSGRDITVTGSNIIADNHTILSAKNNIVLKAAETRSRSAEMNKK 1292 

Orfll7: 124 QKSGLM-SAGIGFTIGSKTNTQENQSQSNEHTGSTVGSLKGDTTIVAGKHYEQIGSTVSS 182 

+KSGLM S GIGFT GSK +TQ N+S++ HT S VGSL G+T I AGKHY Q GST+SS 
pspA: 1293 EKSGLMGSGGIGFTAGSKKDTQTNRSETVSHTESWGSLNGNTLISAGKHYTQTGSTISS 1352 

Orfll7: 183 PEGNNT I YAQS I DIQAAHNKLNSNTTQTYEQKXLTVAFSS PVTD 226 

P+G+ 1+ IIAAN++ +Q YEQK +TVA S PV + 
pspA: 1353 PQG DVG I S S GKI S I DAAQNR YS QE SKQV YEQKG VT VAI S V P WN 1396 
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Homology with a predicted ORF from N. gonorrhoeae 

ORF117 shows 90% identity over a 230aa overlap with a predicted ORF (ORF117ng) from 
N. gonorrhoeae: 

orfll7.pep SGNNLNAKAAEVSSANGTLAVSANNDINIS 30 

I I I I I I I I I I I I : I I : I I I I I 1:111:11 
orfll7ng IHFDADNHTIRGSTNEVGSSIQTKGDVTLLSGNNLNAKAAEVGSAKGTLAVY7VKNDITIS 480 

orf 117 . pep AGINTTHVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANILGS 90 

: I I : : : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf!17ng SG IHAGQVDDASKHTGRSGGGNKLVI T DKAQSHHET AQS S T FEGKQWLQAGNDAN I LGS 540 



orf 117. pep NVISDNGTQIQAGNHVRIGTTQTQSQSETYHQTQKSGLMSAGIGFTIGSKTNTQENQSQS 150 

I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I II I I I I I I I I I I I I I I II I I 

orfll7ng NVISDNGTRIQAGNHVRIGTTQTQSQSETYHQTQKSGLMSAGIGFTIGSKTNTQENQSQS 600 

orf 117. pep NEHTGSTVGSLKGDTTIVAGKHYEQIGSTVSSPEGNNTIYAQSIDIQAAHNKLNSNTTQT 210 

I I I I I I I i I I I I I I I I I I I : I I I I I I I : I I I I I I I I I : I I : I I I I : I : I I I : I I I I 

orfll7ng NEHTGSTVGSLKGDTTIVASKHYEQTGSNVSSPEGNNLISTQSMDIGAAQNQLNSKTTQT 660 

orf 117. pep YEQKXLTVAFS S PVTDLAQQ 230 
I I I I I I I I I I I I I I I I I I I 

orfll7ng YEQKGLTVAFSSPVTDLAQQAIAVAHKAAKQFDKAKTTALMPWRLPMQVGRLFKQAKAPK 720 

An ORF1 17ng nucleotide sequence <SEQ ID 5 19> was predicted to encode a protein having amino 



acid sequence <SEQ ID 520>: 



1 . . LLVQTEKDGL HNEQTFGEKK VFSENGKLHN YWRARRKGHD ETGHREQNYT 

51 LPEEITRDIS LGSFAYESHS KALSRHAPSQ GTELPQSNRD NIRTAKSNGI 

101 SLPYTPNSFT PLPGSSLYII NPANKGYLVE TDPRFANYRQ WLGSDYMLGS 

151 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

201 NGATAARSMN LSVGIALSAE QAAQLTSDIV WLVQKEVKLP DGGTQTVLMP 

251 QVYVRVKNGG IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

301 DNIGGRIHAQ KSAVTATQDI NNIGGILSAE QTLLLNAGNN INNQSTAKSS 

351 QNAQGSSTYL DRMAGIYITG KEKGVLAAQA GKDINIIAGQ ISNQSDQGQT 

401 RLQAGRDINL DTVQTGKYQE IHFDADNHTI RGSTNEVGSS IQTKGDVTLL 

451 SGNNLNAKAA EVGSAKGTLA VYAKNDITIS SGIHAGQVDD ASKHTGRSGG 

501 GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTRI 

551 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

601 NEHTGSTVGS LKGDTTIVAS KHYEQTGSNV SSPEGNNLIS TQSMDIGAAQ 

651 NQLNSKTTQT YEQKGLTVAF SS PVTDLAQQ AIAVAHKAAK QFDKAKTTAL 

701 MPWRLPMQVG RLFKQAKAPK K* 

Further work revealed the following gonococcal partial DNA sequence <SEQ LD 52 1>: 



1 TTGCTTGTGC AAACAGAAAA 

51 CGAGAAGAAA GTCTTCAGCG 

101 CGCGTCGTAA AGGACATGAT 

151 TTGCCGGAGG AAATCACACG 

201 ATCGCATAGC AAAGCATTAA 

251 TGCCACAAAG TAACCGGGAT 

301 TCGCTACCCT ATACGCCCAA 

351 ATACATTATC AATCCTGCCA 

401 GCTTTGCCAA CTACCGTCAA 

451 CTCAAACTAG ACCCAAACAA 

501 CGAGCAACGT TTAATCAATG 

551 GTTTAGACGG TTATCAAAAC 

601 AATGGCGCGA CTGCGGCACG 

651 AAGTGCCGAG CAAGCAGCGC 

701 AAAAAGAAGT TAAACTTCCT 

751 CAGGTTTATG TACGCGTTAA 

801 GTTGTCAGGC AGCAATACAC 

851 CAGGCACGAT TGCAGGGCGC 

901 GACAATATCG GTGGGCGTAT 

951 ACAAGACATC AATAATATTG 

1001 TGCTCAATGC GGGTAACAAC 

1051 CAAAATGCAC AAGGTAGCAG 



AGACGGTTTG CATAACGAGC AAACCTTTGG 
AAAATGGTAA GTTGCACAAC TACTGGCGTG 
GAAACAGGGC ATCGTGAACA AAATTATACT 
CGACATTTCA CTGGGTTCAT TTGCCTATGA 
GCCGTCATGC GCCCAGCCAA GGCACTGAGT 
AATATCCGTA CTGCGAAAAG CAACGGTATT 
TTCTTTTACC CCATTACCCG GCAGCAGCTT 
ATAAAGGCTA TCTTGTTGAA ACCGATCCAC 
TGGTTGGGTA GTGACTATAT GCTGGGCAGC 
TTTACATAAA CGTTTGGGTG ATGGTTATTA 
AACAAATCGC AGAGCTGACA GGGCATCGTC 
GACGAAGAAC AATTTAAAGC CTTAATGGAT 
TTCGATGAAT CTCAGCGTTG GCATTGCATT 
AACTGACCAG CGATATTGTT TGGTTGGTAC 
GATGGCGGCA CACAAACCGT ATTGATGCCA 
AAATGGCGGC ATAGACGGTA AAGGTGCATT 
AAATCAATGT TTCAGGCAGC CTGAAAAACT 
AATGCGCTTA TTATCAATAC CGATACGCTA 
TCATGCGCAA AAATCAGCGG TTACGGCCAC 
GCGGCATTCT TTCTGCCGAA CAGACATTAT 
ATCAACAACC AAAGCACGGC CAAGAGCAGT 
CACCTACCTA GACCGAATGG CAGGTATTTA 
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1101 TATCACAGGC AAAGAAAAAG GTGTTTTAGC AGCGCAGGCA GGCAAAGACA 

1151 TCAACATCAT TGCCGGTCAA ATCAGCAATC AATCAGATCA AGGGCAAACC 

1201 CGGCTGCAGG CAGGACGCGA CATTAACCTG GATACGGTAC AAACCGGCAA 

1251 ATATCAAGAA ATCCATTTTG ATGCCGATAA CCATACCATC CGAGGTTCAA 

1301 CGAACGAAGT CGGCAGCAGC ATTCAAACAA AAGGCGATGT TACCCtatTG 

1351 TCAGGGAATA ATCTCAATGC CAAAGCTGCC GAAGTCGGCA GCGCAAAAGG 

1401 CACACTTGCC GTGTATGCTA AAAATGACAT TACTATCAGC TCAGGCATCC 

1451 ATGCCGGCCA AGTTGATGAT GCGTCCAAAC ATACAGGCAG AAGCGGCGGC 

1501 GGTAATAAAT TAGTCATTAC CGATAAAGCC CAAAGTCATC ACGAAACTGC 

1551 TCAAAGCAGC ACCTTTGAAG GCAAGCAAGT TGTATTGCAG GCAGGAAACG 

1601 ATGCCAACAT CCTTGGCAGT AATGTTATTT CCGATAATGG CACCCGGATT 

1651 CAAGCAGGCA ATCATGTTCG CATTGGTACA ACCCAAACTC AAAGCCAAAG 

1701 CGAAACCTAT CATCAAACCC AAAAATCAGG ATTGATGAGT GCAGGTATCG 

1751 GCTTCACTAT TGGCAGCAAG ACAAACACAC AAGAAAACCA ATCCCAAAGC 

1801 AACGAACATA CAGGCAGTAC CGTAGGCAGC CTGAAAGGCG ATACCACCAT 

1851 TGTTGCAAGC AAACACTACG AACAAACCGG CAGCAACGTT TCCAGCCCTG 

1901 AGGGCAACAA CCTTATCAGC ACGCAAAGTA TGGATATTGG CGCAGCACAA 

1951 AACCAATTAA ACAGCAAAAC CACCCAAACC TACGAACAAA AAGGCTTAAC 

2001 GGTGGCATTC AGTTCGCCCG TTACCGATTT GGCACAACAA GCGATTGCCG 

2051 TAGCACACAA AGCAGCAAAC AAGTCGGACA AAGCAAAAAC GACCGCGTTA 

2101 ATGCCATGGC GGCTGCCAAT GCAGGTTGGC AGGCCTATCA AACAGGCAAA 

2151 GGCGCACAAA ACTTAG 

This corresponds to the amino acid sequence <SEQ ID 522; ORF1 17ng-l>: 

1 LLVQTEKDGL HNEQTFGEKK VFSENGKLHN YWRARRKGHD ETGHREQNYT 

51 LPEEITRDIS IjGSFAYESHS KALSRHAPSQ GTELPQSNRD NIRTAKSNGI 

101 SLPYTPNSFT PLPGSSLYII NPANKGYLVE TDPRFANYRQ WLGSDYMLGS 

151 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

201 NGATAARSMN LSVGIALSAE QAAQLTSDIV WLVQKEVKLP DGGTQTVLMP 

251 QVYVRVKNGG IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

301 DNIGGRIHAQ KSAVTATQDI NNIGGILSAE QTLLLNAGNN INNQSTAKSS 

351 QNAQGSSTYL DRMAGIYITG KEKGVLAAQA GKDINIIAGQ ISNQSDQGQT 

401 RLQAGRDINL DTVQTGKYQE IHFDADNHTI RGSTNEVGSS IQTKGDVTLL 

451 SGNNLNAKAA EVGSAKGTLA VYAKNDITIS SGIHAGQVDD ASKHTGRSGG 

501 GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTRI 

551 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

601 NEHTGSTVGS LKGDTTIVAS KHYEQTGSNV SSPEGNNLIS TQSMDIGAAQ 

651 NQLNSKTTQT YEQKGLTVAF SSPVTDLAQQ AIAVAHKAAN KSDKAKTTAL 

701 MPWRLPMQVG RPIKQAKAHK T* 

ORF117ng-l shows the same 90% identity over a 230aa overlap with ORF117. In addition, it 
shows homology with a secreted N. meningitidis protein in the database: 

gi 1 2623258 (AF030941) putative secreted protein [Neisseria meningitidis] Length = 
2273 

Score - 604 bits (1541), Expect = e-172 

Identities « 325/678 (47%), Positives - 449/678 (65%), Gaps = 22/678 (3%) 

Query: 1 LLVQTEKDGLHNEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDIS 60 

L+V T + L N++T G K + ++ G LH Y R +KG D TG+ Y E++ I 
Sbjct: 739 LIVGTPESALDNDETLGTKTI-TDKGDLHRYHRHHKKGRDSTGYSRSPYEPAPEVS-SIR 796 

Query: 61 LGSFAYESHSKALSRHAPSQGTELPQSNRDNIRTAKSNG I SLPYTPNS FT PLPGSSLYII 120 

+G AY+ + AP Q +++P + + NGI +T LP SSL+ I 

Sbjct: 797 MGISAYKGY APQQASDIPGTV VPWAENGIHPTFT LPNSSLFAI 840 

Query: 121 NPANKGYLVETDPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELT 180 

P NKGYL+ETDP F +YR4-WLGS YML +L+ DPN++HKRLGDGYYEQ+L+NEQIA+LT 
Sbjct: 841 APNNKGYLIETDPAFTDYRKWLGSGYMLAALQQDPNHIHKRLGDGYYEQKLVNEQIAKLT 900 

Query: 181 GHRRLDGYQNDEEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDI WLVQKEVKLP 240 

G+RRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQ A+LTSDIVWL + V LP 
Sbjct: 901 GYRRLDGYTNDEEQFKALMDNGITIAKELQLTPGIALSAEQVARLTSDIVWLENETVTLP 960 

Query: 241 DGGTQT VLMPQVYVRVKNGGI DGKGALLSGSNTQINVSGSLKN-SGT IAGRNALI INTDT 299 

DG TQTVL P+VYVR + ++G+GALLSGS I SG+++N G IAGR ALI+N 
Sbjct: 961 DGTTQTVLKPKVYVRARPKDMNGQGALLSGSWDIG-SGAIENRGGLIAGREALILNAQN 1019 



Query: 



300 



LDNIGGRIHAQKSAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTY 359 
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+N+G++ ADINGIAE LLL A NNI ++S +S+QN QGS 

Sbjct: 1020 IKNLQGDLQGKNIFAAAGSDITNTGSI-GAENALLLKASNNIESRSETRSNQNEQGSVRN 1078 

Query: 360 LDRMAGIYITGKEKGVLAAQAGKDINIIAGQISNQSDQGQTRLQAGRDINLDTVQTGKYQ 419 

+ R+AGIY+TG++ G + AG +1 + A +++NQS+ GQT L AG DI DT + Q 
Sbjct: 1079 IGRVAGIYLTGRQNGSVLLDAGNNIVLTASELTNQSEDGQTVLNAGGDIRSDTTGISRNQ 1138 

Query: 420 EIHFDADNHTIRGSTNEVGSSIQTKGDVTLLSGNNLNAKAAEVGSAKGTLAVYAKNDITI 479 

FD+DN+ IR NEVGS+I+T+G+++L + ++ +AAEVGS +G L + A DI + 
Sbjct: 1139 NTIFDSDNYVIRKEQNEVGSTIRTRGNLSLNAKGDIRIRAAEVGSEQGRLKLAAGRDIKV 1198 

Query: 480 SSGIHAGQVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANILG 539 

+G + +DA K+TGRSGGG K +T ++ + A S T +GK+++L +G D + G 
Sbjct: 1199 EAGKAHTETEDALKYTGRSGGGIKQKMTRHLKNQNGQAVSGTLDGKEIILVSGRDITVTG 1258 

Query: 54 0 SNVISDNGTRIQAGNHVRIGTTQTQSQSETYHQTQKSGLM-SAGIGFTIGSKTNTQENQS 598 

SN+I+DN T + A N++ + +T+S+S ++ +KSGLM S GIGFT GSK +TQ N+S 
Sbjct: 1259 SNIIADNHTILSAKNNIVLKAAETRSRSAEMNKKEKSGLMGSGGIGFTAGSKKDTQTNRS 1318 

Query: 599 QSNEHTGSTVGSLKGDTTIVASKHYEQTGSNVSSPEGNNLISTQSMDIGAAQNQLNSKTT 658 

++ HT S VGSL G+T I A KHY QTGS +SSP+G+ IS+ + I AAQN+ + ++ 
Sbjct: 1319 ETVSHTES WGSLNGNTLI S AGKHYTQTGSTI SS PQGDVGI SSGKI S I DAAQNRYSQESK 1378 

Query: 659 QTYEQKGLTVAFS S PVT D 67 6 

Q YEQKG+TVA S PV + 
Sbjct: 1379 QVYEQKGVTVAI SVPWN 1396 

Based on this analysis, it is predicted that the proteins from N.meningitidis and K gonorrhoeae, 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 63 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 523>: 

1 ATGATTTACA TCGTACTGTT TCTAGCTGTC GTCCTCGCCG TTGTCGCCTA 

51 CAACATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GACACTCCGA CAAAGATGCC CTGCTCAACA GCAwAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GTCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA ACGGCAAAAC CCCAAGACCC CGyCATGCGC AACCTGCAAG 

251 AACAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 

301 TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAAGCGGCA TTATCGGCAA 

351 CTCCGCCCAC ACCGTTTCCG AACCCCAAAC CGGACATTCC GCAACGAAAC 

401 CTGCCGACGC GTCGGCAAAA CCTGCACCCG TTCCGCAAAC ACCTGCAAAA 

451 CCGCTGATTA CGCTCAAAGA ACTGTCAAAA GTCGAATTAT CCTGGTTTGA 

501 CGTGCGCATC GACTTCATCT CCTAT . . . 

This corresponds to the amino acid sequence <SEQ ID 524; ORF1 19>: 

1 MIYIVLFLAV VLAWAYNMY QENQYRKKVR DQFGHSDKDA LLNSXTSHVR 

51 DGKPSGGSVM MPKPQPAVKK TAKPQDPXMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE ESGIIGNSAH TVSEPQTGHS ATKPADASAK PAPVPQTPAK 

151 PLITLKELSK VELSWFDVRI DFISY. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 525>: 

1 ATGATTTACA TCGTACTGTT TCTAGCTGTC GTCCTCGCCG TTGTCGCCTA 

51 CAACATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GACACTCCGA CAAAGATGCC CTGCTCAACA GCAAAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GTCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA ACGGCAAAAC CCCAAGACCC CGCCATGCGC AACCTGCAAG 

251 AACAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 

301 TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAAGCGGCA TTATCGGCAA 

351 CTCCGCCCAC ACCGTTTCCG AACCCCAAAC CGGACATTCC GCACCGAAAC 

401 CTGCCGACGC GCCGGCAAAA CCTGCACCCG TTCCGCAAAC ACCTGCAAAA 

451 CCGCTGATTA CGCTCAAAGA ACTGTCAAAA GTCGAATTAC CCTGGTTTGA 

501 CGTGCGCTTC GACTTCATCT CCTATATCGC GCTGACCGAA GCCAAAGAAC 

551 TGCACGCACT GCCGCGCCTT TCCAACCGCT GCCGCTACCA GATTGTCGGC 

601 TGCACCATGG ACGACCATTT CCAGATTGCC GAACCCATCC CGGGCATCCG 



WO 99/24578 



-308- 



PCT/IB98/01665 



651 CTATCAGGCA TTTATCGTGG GTATTCAGGC AGTCAGCCGC AACGGACTTG 

701 CCTCGCAGGA AGAACTCTCC GCATTCAACC GCCAGGTGGA CGCATTCGCA 

751 CAAAGCATGG GCGGTCAGAC GCTGCACACC GACCTTGCCG CCTTTATCGA 

801 AGTGGCTTCC GCACTGGACG CATTCTGCGC GCGCGTCGAC CAGACCATCG 

851 CCATCCATTT GGTTTCCCCG ACCAGCATCA GCGGCGTAGA ACTGCGTTCC 

901 GCCGTAACGG GCGTGGGTTT CGTTTTGGAA GACGACGGCG CGTTCCACTA 

951 TACCGACACG TCGGGCTCGA CCATGTTCTC CATCTGCTCG CTCAACAACG 

1001 AGCCGTTTAC CAACGCCCTT TTGGACAACC AGTCCTACAA AGGCTTCAGT 

1051 ATGCTGCTCG ACATCCCGCA CTCTCCGGCA GGCGAAAAAA CCTTCGACGA 

1101 TTTGTTTATG GATTTGGCGG TACGCCTGTC CGGCCAGTTG AACCTGAATC 

1151 TGGTCAACGA CAAAATGGAA GAAGTTTCGA CCCAATGGCT CAAAGACGTG 

1201 CGCACTTATG TATTGGCGCG TCAGTCCGAG ATGCTCAAAG TCGGTATCGA 

1251 ACCGGGCGGC AAAACCGCAT TGCGCCTGTT CTCCTAA 

This corresponds to the amino acid sequence <SEQ ID 526; ORF1 19-1>: 

1 MIYIVLFLAV VLAWA YNMY QENQYRKKVR DQFGHSDKDA LLNSKTSHVR 

51 DGKPSGGSVM MPKPQPAVKK TAKPQDPAMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE ESGIIGNSAH TVSEPQTGHS APKPADAPAK PAPVPQTPAK 

151 PLITLKELSK VELPWFDVRF DFISYIALTE AKELHALPRL SNRCRYQIVG 

201 CTMDDHFQIA EPIPGIRYQA FIVGIQAVSR NGLASQEELS AFNRQVDAFA 

251 QSMGGQTLHT DLAAFIEVAS ALDAFCARVD QTIAIHLVSP TSISGVELRS 

301 AVTGVGFVLE DDGAFHYTDT SGSTMFSICS LNNEPFTNAL LDNQSYKGFS 

351 MLLDIPHSPA GEKTFDDLFM DLAVRLSGQL NLNLVNDKME EVSTQWLKDV 

401 RTYVLARQSE MLKVGIEPGG KTALRLFS* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF1 19 shows 93.7% identity over a 175aa overlap with an ORF (ORF1 19a) from strain A of//. 
meningitidis: 

10 20 30 40 50 60 

or f 1 1 9 . pep MIYIVLFLAWLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSXTSHVRDGKPSGGSVM 
I I I I I I I II : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II 
orfll9a MIYIVLFLAAVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 119 . pep MPKPQPAVKKTAKPQDPXMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 

I I I I I I I I I I I I I III I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 119a MPKPQPAVKKTAKSQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 

70 80 90 100 110 120 

130 140 150 160 170 

orf 119 . pep TVSEPQTGHSATKPADASAKPAPVPQTPAKPLITLKELSKVELSWFDVRIDFISY 

II MINIM Mill I I I M I I M I I II II I I M I I I I I I IIIIIMMII 

orf 119a TVPEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 
130 140 150 160 170 180 

orf 119a AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 

190 200 210 220 230 240 

The complete length ORF119a nucleotide sequence <SEQ ED 527> is: 

1 ATGATTTACA TCGTACTGTT CCTCGCCGCC GTCCTCGCCG TTGTCGCCTA 

51 CAATATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GGCACTCCGA CAAAGATGCC CTGCTCAACA GCAAAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GCCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA ACGGCAAAAT CCCAAGACCC CGCCATGCGC AACCTGCAAG 

251 AGCAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 

301 TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAAGCGGCA TTATCGGCAA 

351 CTCCGCCCAC ACCGTTCCCG AACCCCAAAC CGGACATTCC GCACCAAAAC 

401 CTGCCGACGC GCCGGCAAAA CCTGTTCCCG TTCCGCAAAC GCCGGCAAAA 

451 CCGCTGATTA CGCTCAAAGA GCTGTCGAAG GTCGAGCTGC CCTGGTTTGA 

501 CGTGCGCTTC GACTTCATCT CTTATATCGC GCTGACCGAA GCCAAAGAAC 

551 TGCACGCACT GCCGCGCCTT TCCAACCGCT GCCGCTACCA GATTGTCGGC 

601 TGCACCATGG ACGACCATTT CCAGATTGCC GAACCCATCC CGGGCATCCG 
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651 CTATCAGGCA TTTATCGTGG GTATTCAGGC AGTCAGCCGC AACGGACTTG 

701 CCTCGCAGGA AGAACTCTCC GCATTCAACC GCCAGGTGGA TGCATTCGCA 

751 CACAGCATGG GCGGTCAGAC GCTGCACACC GACCTTGCCG CCTTTATCGA 

801 AGTGGCTTCC GCACTGGACG CATTCTGCGC GCGCGTCGAC CAGACTATCG 

851 CCATCCATTT GGTTTCCCCG ACCAGCATCA GCGGCGTAGA ACTGCGTTCC 

901 GCCGTAACGG GCGTGGGTTT CGTTTTGGAA GACGACGGCG CGTTCCACTA 

951 TACCGACACG TCGGGCTCGA CCATGTTCTC CATCTGCTCG CTCAACAACG 

1001 AGCCGTTTAC CAATGCCCTT TTGGACAACC AGTCCTATAA AGGCTTCAGT 

1051 ATGCTGCTCG ACATCCCGCA CTCTCCGGCA GGCGAAAAAA CCTTCGACGA 

1101 TTTGTTTATG GATTTGGCGG TACGCCTGTC CGGCCAGTTG AACCTGAATC 

1151 TGGTCAACGA CAAAATGGAA GAAGTTTCGA CCCAATGGCT CAAAGACGTG 

1201 CGCACTTATG TATTGGCTCG TCAGTCCGAG ATGCTCAAAG TCGGTATCGA 

1251 ACCGGGCGGC AAAACCGCAT TGCGCCTGTT CTCCTAA 

This encodes a protein having amino acid sequence <SEQ ID 528>: 

1 MIYIVLFLAA VLAWA YNMY QENQYRKKVR DQFGHSDKDA LLNSKTSHVR 

51 DGKPSGGPVM MPKPQPAVKK TAKSQDPAMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE ESGIIGNSAH TVPEPQTGHS APKPADAPAK PVPVPQTPAK 

151 PLITLKELSK VELPWFDVRF DFISYIALTE AKELHALPRL SNRCRYQIVG 

201 CTMDDHFQIA EPIPGIRYQA FIVGIQAVSR NGLASQEELS AFNRQVDAFA 

251 HSMGGQTLHT DLAAFIEVAS ALDAFCARVD QTIAIHLVSP TSISGVELRS 

301 AVTGVGFVLE DDGAFHYTDT SGSTMFSICS LNNEPFTNAL LDNQSYKGFS 

351 MLLDIPHSPA GEKTFDDLFM DLAVRLSGQL NLNLVNDKME EVSTQWLKDV 

401 RTYVLARQSE MLKVGIEPGG KTALRLFS* 

ORF1 19a and ORF1 19-1 show 98.6% identity in 428 aa overlap: 

10 20 30 40 50 60 

orfll9a.pep MIYIVLFLAAVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 
I I I I I I I II : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I t I I I i I I I I II ! I I M 
orfll9-l MIYIVLFLAWLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGSVM 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 119a . pep MPKPQPAVKKTAKSQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 

I I I I I I I I I I I I 1 I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I II I I I I I II I I 
orf 119-1 MPKPQPAVKKTAKPQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 119a . pep TVPEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 

II I I I I I I I I I I I I I I I I I I : I I I I I II I I I I I I I I I II I I I I I I I I I I I 1 I II I II I I 
orf 119-1 TVSEPQTGHSAPKPADAPAKPAPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 119a . pep AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 
M I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I i I I I I I I I I I I I I I I I I 
orf 119-1 AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 119a . pep AFNRQVDAFAHSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 
I | | I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 119-1 AFNRQVDAFAQSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 11 9a. pep AVTGVGFVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 

I I I I I I I I I I I I 1 I I I I I II I I I II I I I I I I II I I I I I I I I I I I M I I I I I I I I I I I I I I 
orf 119-1 AVTGVGFVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 

j 310 320 330 340 350 360 

370 380 390 400 410 420 

orf 119a . pep GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 

II II I I I I I II I I I I I I I I I I I I II I I I I I II I I I I II I II I I I I I M I I I I I I I I I I I I 
orf 11 9-1 GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 

370 380 390 400 410 420 
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orf 119a. pep KTALRLFSX 
I I I I I I I I I 
orf 11 9-1 KTALRLFSX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF119 shows 93.1% identity over a 175aa overlap with a predicted ORF (ORF119ng) from 
N. gonorrhoeae: 

orf 119 . pep MIYIVLFLAWLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSXTSHVRDGKPSGGSVM 60 

I I I I I I I I I : i I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I II 
orfll9ng MIYIVLFLAAVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 60 

orf 119. pep MPKPQPAVKKTAKPQDPXMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 120 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 
orfll9ng MPKPQPAVKKPAKPQDSAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEEIGIIGNSAH 120 

orf 11 9. pep TVSEPQTGHSATKPADASAKPAPVPQTPAKPLITLKELSKVELSWFDVRIDFISY 175 

I I II I I I I I I I Mill I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I 
orfll9ng TVSEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 180 

The complete length ORF1 19ng nucleotide sequence <SEQ ID 529> is: 



1 ATGATTTACA TCGTACTGTT CCTCGCCGCC GTCCTCGCCG TTGTCGCCTA 

51 CAATATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GACACTCCGA CAAAGATGCC CTGCTCAACA GCAAAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GCCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA CCGGCCAAAC CCCAAGACTC CGCCATGCGC AACCTGCAAG 

251 AACAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 

301 TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAATCGGCA TTATCGGCAA 

351 CTCCGCCCAC ACCGTTTCCG AACCCCAAAC CGGACATTCC GCACCGAAAC 

401 CTGCCGACGC GCCGGCAAAA CCCGTTCCCG TTCCGCAAAC GCCGGCAAAA 

451 CCGCTGATTA CGCTCAAAGA GCTGTCGAAG GTCGAGCTGC CCTGGTTTGA 

501 CGTGCGCTtc gACTTCATCT CCTATATCGC GCTGACCGAA GCCAAAGAAC 

551 TGCACGCACT GCCGCGCCTT tccAACCGCT GCCGCTACCA GATTGTCGGC 

601 TGCACCATGG ACGACCATTT CCAGATTGCC GAACCCATCC CGGGCATCCG 

651 CTATCAGGCA TTTATCGTGG GTATCCAGGC AGTCAGCCGC AACGGACTTG 

701 CCTCGCAGGA AGAACTCTCC GCATTCAACC GCCAGGCGGA CGCATTCGCA 

751 CAAAGCATGG GCGGTCAGAC GCTGCACACC GACCTTGCCG CCTTTATCGA 

801 AGTGGCTTCC GCACTGGACG CATTCTGCGC GCGCGTCGAC CAGACCATCG 

851 CCATCCATTT GGTTTCGCCG ACCAGCATCA GCGGCGTAGA ACTGCGTTCC 

901 GCCGTAACGG GCGTGGGTTT CGTTTTGGAA GACGACGGCG CGTTCCACTA 

951 TACCGACACG TCGGGCTCGA CCATGTTCTC CATCTGCTCG CTCAACAACG 

1001 AGCCGTTTAC CAATGCCCTT TTGGACAACC AGTCCTACAA AGGCTTCAGT 

1051 ATGCTGCTCG ACATCCCGCA CTCTCCGGCA GGCGAAAAAA CCTTCGACGA 

1101 TTTGTTTATG GATTTGGCGG TACGCCTGTC CGGTCAGTTG AACCTGAATC 

1151 TGGTCAACGA CAAAATGGAA GAAGTTTCGA CCCAATGGCT CAAAGACGTA 

1201 CGCACTTATG TATTGGCGCG TCAGTCCGAG ATGCTCAAAG TCGGTATCGA 

1251 ACCGGGCGGC AAAACCGCCC TGCGCCTGTT TTCATAA 

This encodes a protein having amino acid sequence <SEQ ID 530>: 



1 MIYIVLFLAA VLAWA YNMY QENQYRKKVR DQFGHSDKDA LLNSKTSHVR 

51 DGKPSGGPVM MPKPQPAVKK PAKPQDSAMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE EIGIIGNSAH TVSEPQTGHS APKPADAPAK PVPVPQTPAK 

151 PLITLKELSK VELPWFDVRF DFISYIALTE AKELHALPRL SNRCRYQIVG 

201 CTMDDHFQIA EPIPGIRYQA FIVGIQAVSR NGLASQEELS AFNRQADAFA 

251 QSMGGQTLHT DLAAFIEVAS ALDAFCARVD QTIAIHLVSP TSISGVELRS 

301 AVTGVGFVLE DDGAFHYTDT SGSTMFSICS LNNEPFTNAL LDNQSYKGFS 

351 MLLDIPHSPA GEKTFDDLFM DLAVRLSGQL NLNLVNDKME EVSTQWLKDV 

401 RTYVLARQSE MLKVGIEPGG KTALRLFS* 

ORF1 19ng and ORF119-1 show 98.4% identity over 428 aa overlap: 



10 20 30 40 50 60 

or f 1 1 9ng MIYIVLFLAAVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 
I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I II i I II 
orf 11 9-1 MIYIVLFLAWLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGSVM 

10 20 30 40 50 60 
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10 



15 



20 



25 



30 



35 



70 80 90 100 110 120 

orfll9ng MPKPQPAVKKPAKPQDS AMRNLQEQDAVY I AKQKQAKAS P FKTE IET ALEE I GI I GN S AH 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I IMNIII 
orf 119-1 MPKPQPAVKKTAKPQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 

70 80 90 100 110 120 

130 140 150 160 170 180 

orfll9ng TVSEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 
I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 119-1 TVSEPQTGHSAPKPADAPAKPAPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 

130 140 150 160 170 180 

190 200 210 220 230 240 

orfll9ng AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I f I I I I I I I I I I I I I I I I I I I I I I 1 
orf 119-1 AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 

190 200 210 220 230 240 

250 260 270 280 290 300 

orfll9ng AFNRQADAFAQSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 
I I i I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I 
orf 119-1 AFNRQVDAFAQSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 

250 260 270 280 290 300 

310 320 330 340 350 360 

orfll9ng AVTGVGFVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 119-1 AVTGVGFVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 

310 320 330 340 350 360 

370 380 390 400 410 420 

orfll9ng GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 
i II I I I I I I ! I I I I I I I I I I I I I I I I II I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I 
orfll9-l GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 

370 380 390 400 410 420 



40 



429 

orfll9ng KTALRLFSX 
MINIMI 
orf!19-l KTALRLFSX 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



45 Example 64 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 531 > 



50 



55 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 



. GCGCGGCACG 
GCAGATAGTC 
TCGCCCTGAT 
CTGGTGTCCG 
CGGCGCGCGG 
TAATCTGCGT 
AGCCTCGTGT 
CATGTCCGTC 
TCGGCTTTAT 
TTGGCACAGG 



GCACGGAAGA 
GAAAGCACCA 
TTCATTGGTA 
TTACCGAGCG 
CGCGGCAATA 
CATCGGCGGT 
TCAATCATTT 
ATCGGCGCGG 
GCCTGCCAAT 
ATTGA 



TTTCTTCATG 
CCGGTACGAT 
GTCGGCGGCA 
CACCAAAGAA 
TTTyGCAGCA 
TTGGTCGGCG 
TGTAACCGAC 
TCGCCTGTTC 
AAAGCAGCCA 



AACAACAGCG 
GAAGCTGCTG 
TCGGCGTGAT 
ATCGGCATAC 
GTTTTTGATT 
TGGGTTTGTC 
TTCCCGATGG 
GACCGGAATC 
AACTCAATCC 



AC AC . ATCAG 
ATTTCCTCCA 
GAACATCATG 
GGATGGCAAT 
GAGGCGGTGT 
CGCCGCCGTC 
ACATTTCCGC 
GGCATCGCGT 
GATAGACGCA 



60 



This corresponds to the amino acid sequence <SEQ ID 532; ORF134>: 

1 . . ARHGTEDFFM NNSDXIRQIV ESTTGTMKLL ISSIALISLV VGGIGVMNIM 

51 LVSVTERTKE IGIRMAIGAR RGNIXQQFLI EAVLICVIGG LVGVGLSAAV 

101 SLVFNHFVTD FPMDISAMSV IGAVACSTGI GIAFGFMPAN KAAKLNPIDA 

151 LAQD* 
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Further work revealed the complete nucleotide sequence <SEQ ID 533>: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 



ATGTCGGTGC 
GCTCGGCATC 
GCAATGGTTC 
AACACCATCA 
CAGGATTAAA 
GCTACGTTGC 
TACCGCAACA 
TTTCGACGTG 
ACGATGTGAA 
GACAAACTCT 
GAAACGCCCC 
TCGGCAATTC 
CACCAAATCA 
AGACAATGCC 
AAGCGCGGCA 
AGGCAGATAG 
CATCGCCCTG 
TGCTGGTGTC 
ATCGGCGCGC 
GTTAATCTGC 
TCAGCCTCGT 
GCCATGTCCG 
GTTCGGCTTT 
CATTGGCACA 



AAGCAGTATT 
ATCATCGGTA 
GCAGAAAAAA 
GCATCTTCCC 
ACCCTGACCA 
TTCCGCCACG 
CCGACCTGAC 
CGCGGACTGA 
AGAAGACGCG 
TTGCGGACTC 
TTGACCGTCA 
CGACGTGCTG 
CAGGCGAGAG 
AATACCCAGG 
CGGCACGGAA 
TCGAAAGCAC 
ATTTCATTGG 
CGTTACCGAG 
GGCGCGGCAA 
GTCATCGGCG 
GTTCAATCAT 
TCATCGGCGC 
ATGCCTGCCA 
GGATTGA 



GGCGCACAAA 
TCGCGTCGGT 
ATCCTTGAAG 
GGGGCGCGGC 
TAGACGACGC 
CCCATGACTT 
CGCCTCGCTT 
AGCTGGAAAC 
CAGGTCGTCG 
GGATCCGTTG 
TCGGCGTGAT 
ATGCTTTGGT 
CCACACCAAC 
TTGCCGAAAA 
GATTTCTTCA 
CACCGGTACG 
TAGTCGGCGG 
CGCACCAAAG 
TATTTTGCAG 
GTTTGGTCGG 
TTTGTAACCG 
GGTCGCCTGT 
ATAAAGCAGC 



ATGCGTTCGC 
GGTTTCCGTC 
ACATCAGTTC 
TTCGGCGACA 
AAAAATCATC 
CGAGCGGCGG 
TACGGCGTGG 
GGGGCGGCTG 
TCATCGACCA 
GGTAAAACCA 
GAAAAAAGAC 
CGCCCTATAC 
TCCATCACCG 
AGGGCTGACC 
TGAACAACAG 
ATGAAGCTGC 
CATCGGCGTG 
AAATCGGCAT 
CAGTTTTTGA 
CGTGGGTTTG 
ACTTCCCGAT 
TCGACCGGAA 
CAAACTCAAT 



TTCTGACGAT 
GTCGCATTGG 
GATAGGGACG 
GGCGCAGCGG 
GCCAAACAAA 
CACGCTGACT 
GCGAACAATA 
TTTGACGAAA 
AAATGTCAAA 
TTTTGTTCAG 
GAAAACGCTT 
GACGGTGATG 
TCAAAATCAA 
GATCTGCTCA 
CGACAGCATC 
TGATTTCCTC 
ATGAACATCA 
ACGGATGGCA 
TTGAGGCGGT 
TCCGCCGCCG 
GGACATTTCC 
TCGGCATCGC 
CCGATAGACG 



This corresponds to the amino acid sequence <SEQ ID 534; ORF134-l>: 



1 MSVQAVLAHK MRSLLTMLGI IIGIASWSV VALGN GSQKK ILEDISSIGT 

51 NTISIFPGRG FGDRRSGRIK TLTIDDAKII AKQSYVASAT PMTSSGGTLT 

101 YRNTDLTASL YGVGEQYFDV RGLKLETGRL FDENDVKEDA QVWIDQNVK 

151 DKLFADSDPL GKTILFRKRP LTVIGVMKKD ENAFGNSDVL MLWSPYTTVM 

201 HQITGESHTN SITVKIKDNA NTQVAEKGLT DLLKARHGTE DFFMNNSDSI 

251 RQIVESTTGT MK LLISSIAL ISLWGGIGV MNIMLVSVTE RTKEIGIRMA 

301 IGARRGNILQ QFLIEAVLIC VIGGLVGV GL SAAVSLVFNH FVTDFPMDIS 

351 AMS VIGAVAC STGIGIAFGF MPANKAAKLN PIDALAQD* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with the hypothetical protein o648 of E.coli (accession number AE000189) 
ORF134 and o648 protein show 45% aa identity in 153aa overlap: 

Orfl34: 2 RHGTEDFFMNKSDXIRQIVESTTGTMKXXXXXXXXXXXWGGIGVMNIMLVSVTERTKEI 61 

RHG +DFF N D + + VE TT T++ WGGIGVMNIMLVSVTERT+E I 

0648: 496 RHGKKDFFTWNMDGVLKTVEKTTRTLQLFLTLVAVISLWGGIGVMNIMLVSVTERTREI 555 

Orfl34: 62 GIRMAIGARRGNIXQQFLIEAXXXXXXXXXXXXXXXXXXXXXFNHFVTDFPMDISAMSVI 121 

GIRMA+GAR ++ QQFLIEA F+ + + S ++++ 

0648: 556 GIRMAVGARASDVLQQFLIEAVLVCLVGGALGITLSLLIAFTLQLFLPGWEIGFSPLALL 615 



Orfl34: 122 GAVACSTG I G I AFG FM PANKAAKLN PI DALAQD 154 

A CST GI FG++PA AA+L+P+DALA++ 
0648: 616 LAFLCSTVTGILFGWLPARNAARLDPVDALARE 648 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF134 shows 98.7% identity over a 154aa overlap with an ORF (ORF134a) from strain A of N. 
meningitidis: 



10 20 30 

orf 134 .pep ARHGTEDFFMNNSDXIRQIVESTTGTMKLL 

I I I IE I I i I 1 I I I I I I M I 1 I I I I I i ! ! I 
orf 134a GESHTNSITVKIKDNANTQVAEKGLTDLLKARHGTEDFFMNNSDSIRQIVESTTGTMKLL 
210 220 230 240 250 260 



40 50 60 70 



80 90 
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orfl34.pep ISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMAIGARRGNIXQQFLIEAVLICVIGG 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I 1 I I I I I II I I I I I I I 
orfl34a ISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMAIGARRGNILQQFLIEAVLICVIGG 
270 280 290 300 310 320 



100 110 120 130 140 150 

or f 134. pep LVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 
I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I i I I I I I I I I I I t I I I I I I I I I i 
orf 134a LVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 
330 340 350 360 370 380 



orf 134. pep LAQDX 
I I I I I 

orfl34a LAQDX 

The complete length ORF134a nucleotide sequence <SEQ ID 535> is: 

1 ATGTCGGTGC AAGCAGTATT GGCGCACAAA ATGCGTTCGC TTCTGACGAT 

51 GCTCGGCATC ATCATCGGTA TCGCTTCGGT TGTCTCCGTC GTCGCATTGG 

101 GCAACGGTTC GCAGAAAAAA ATCCTTGAAG ACATCAGTTC GATAGGGACG 

151 AACACCATCA GCATCTTCCC AGGGCGCGGC TTCGGCGACA GGCGCAGCGG 

201 CAGGATTAAA ACCCTGACCA TAGACGACGC AAAAATCATC GCCAAACAAA 

251 GCTACGTTGC TTCCGCCACG CCCATGACTT CGAGCGGCGG CACGCTGACT 

301 TACCGCAATA CCGACCTGAC CGCTTCTTTG TACGGTGTGG GCGAACAATA 

351 TTTCGACGTG CGCGGGCTGA AGCTGGAAAC GGGGCGGCTG TTTGACGAAA 

401 ACGATGTGAA AGAAGACGCG CAGGTCGTCG TCATCGACCA AAATGTCAAA 

451 GACAAACTCT TTGCGGACTC GGATCCGTTG GGTAAAACCA TTTTGTTCAG 

501 GAAACGCCCC TTGACCGTCA TCGGCGTGAT GAAAAAAGAC GAAAACGCTT 

551 TCGGCAATTC CGACGTGCTG ATGCTTTGGT CGCCCTATAC GACGGTGATG 

601 CACCAAATCA CAGGCGAGAG CCACACCAAC TCCATCACCG TCAAAATCAA 

651 AGACAATGCC AATACCCAGG TTGCCGAAAA AGGGCTGACC GATCTGCTCA 

701 AAGCGCGGCA CGGCACGGAA GATTTCTTCA TGAACAACAG CGACAGCATC 

751 AGGCAGATAG TCGAAAGCAC CACCGGTACG ATGAAGCTGC TGATTTCCTC 

801 CATCGCCCTG ATTTCATTGG TAGTCGGCGG CATCGGCGTG ATGAACATCA 

851 TGCTGGTGTC CGTTACCGAG CGCACCAAAG AAATCGGCAT ACGGATGGCA 

901 ATCGGCGCGC GGCGCGGCAA TATTTTGCAG CAGTTTTTGA TTGAGGCGGT 

951 GTTAATCTGC GTCATCGGCG GTTTGGTCGG CGTGGGTTTG TCCGCCGCCG 

1001 TCAGCCTCGT GTTCAATCAT TTTGTAACCG ACTTCCCGAT GGACATTTCC 

1051 GCCATGTCCG TCATCGGCGC GGTCGCCTGT TCGACCGGAA TCGGCATCGC 

1101 GTTCGGCTTT ATGCCTGCCA ATAAAGCAGC CAAACTCAAT CCGATAGATG 

1151 CATTGGCGCA GGATTGA 

This encodes a protein having amino acid sequence <SEQ ID 536>: 

1 MSVQAVLAHK MRSLLTMLGI IIGIASWSV VALG NGSQKK ILEDISSIGT 

51 NTISIFPGRG FGDRRSGRIK TLTIDDAKII AKQSYVASAT PMTSSGGTLT 

101 YRNTDLTASL YGVGEQYFDV RGLKLETGRL FDENDVKEDA QVWIDQNVK 

151 DKLFADSDPL GKTILFRKRP LTVIGVMKKD ENAFGNSDVL MLWSPYTTVM 

201 HQITGESHTN SITVKIKDNA NTQVAEKGLT DLLKARHGTE DFFMNNSDSI 

251 RQIVESTTGT MKL LISSIAL ISLWGGIGV MNIMLVSVTE RTKEIGIRMA 

301 IGARRGNILQ QFLIEAVLIC VIGGLVGV GL SAAVSLVFNH FVTDFPMDIS 

351 AMS VIGAVAC STGIGIAFGF MPANKAAKLN PIDALAQD* 

ORF134a and ORF134-1 show 100.0% identity in 388 aa overlap: 

orf 134a . pep MSVQAVLAHKMRSLLTMLGIIIGIASWSWALGNGSQKKILEDISSIGTNTISIFPGRG 
I I I I I I I I I I I II I I I I I I I I I I I I I II I II I I I II I I I I I I I I I I I I I I I I I I I I II II 
orf 134-1 MSVQAVLAHKMRSLLTMLGIIIGIASWSWALGNGSQKKILEDISSIGTNTISIFPGRG 

orf 134a . pep FGDRRSGRIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 
I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 11 I I II I I I I I I I I I I I I I I I I I 
orf 134-1 FGDRRSGRIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 

orf 134a . pep RGLKLETGRL FDEN DVKE DAQWVI DQNVKDKLFADS DPLGKT I LFRKRPLTVIGVMKKD 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I 
orf 134-1 RGLKLETGRLFDENDVKEDAQVWI DQNVKDKLFADS DPLGKT I LFRKRPLTVIGVMKKD 

orf 134a . pep ENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTQVAEKGLTDLLKARHGTE 
I I II I I I I t I I I I I I i I I I I I I i 1 I I I I I I I I I I II I I I I I I I I I ! I I I I I I | | | | I I I I 
orf 134-1 ENAFGNSDVLMLWSPYTTVMHQITGESHTNS I TVKIKDNANTQVAEKGLT DLLKARHGTE 
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orf 134a. pep DFFMNNSDSIRQIVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 
I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 134-1 DFFMNNSDSIRQIVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 

orf 134a . pep IGARRGNILQQFLIEAVLICVIGGLVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVAC 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 134-1 IGARRGNILQQFLIEAVLICVIGGLVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVAC 



orf 134a. pep STGIGIAFGFMPANKAAKLNPIDALAQDX 
I I I I I I I II M I I M II I I I I ! I I I I I I I 
orf 134-1 STGIGIAFGFMPANKAAKLNPIDALAQDX 



Homology with a predicted ORF from N.zonorrhoeae 

ORF134 shows 96.8% identity over a 154aa overlap with a predicted ORF (ORF134.ng) from N. 
gonorrhoeae: 

orf 134 .pep ARHGTE DFFMNNS DX I RQ I VE STTGTMKLL 30 

lllilMMIIMI I I I : I I I I I I I I I ! I 
orfl34ng GESHTNSITVKIKDNANTRVAEKGLAELLKARHGTEDFFMNNSDSIRQMVESTTGTMKLL 264 



orf 134 . pep ISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMAIGARRGNIXQQFLIEAVLICVIGG 90 

I I I I I I I I I I I I I I I II I I I I I I I I I II M I I I I I M I ! I II M I I I I I II I I I I : I I I 

orfl34ng ISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMAIGARRGNILQQFLIEAVLICIIGG 324 

orf 134 .pep LVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 150 

I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I 1 I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I 

orfl34ng LVGVGLSAAVSLVFNHFVTDFPMDISAASVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 384 



orf 134. pep LAQD 154 
Mil 



orfl34ng LAQD 388 

The complete length ORF134ng nucleotide sequence <SEQ ID 53 7> is: 



1 ATGTCGGTGC AAGCAGTATT GGCGCACAAA ATGCGTTCGC TTCTGACCAT 

51 GCTCGGCATC ATCATCGGTA TCGCTTCGGT TGTCTCCGTC GTCGCGCTGG 

101 GCAACGGTTC GCAGAAAAAA ATCCTCGAAG ACATCAGTTC GATGGGGACG 

151 AACACCATCA GCATCTTCCC CGGGCGCGGC TTCGGCGACA GGCGCAGCGG 

201 CAAAATCAAA ACCCTGACCA TAGACGACGC AAAAATCATC GCCAAACAAA 

251 GCTACGTTGC CTCCGCCACG CCCATGACTT CGAGCGGCGG CACGCTGACC 

301 TACCGCAATA CCGACCTGAC CGCTTCTTTG TACGGTGTGG GCGAACAATA 

351 TTTCGACGTG CGCGGGCTGA AGCTGGAAAC GGGGCGGCTG TTTGATGAGA 

401 ACGATGTGAA AGAAGACGCG CAAGTCGTCG TCATCGACCA AAATGTCAAA 

451 GACAAACTCT TTGCGGACTC GGATCCGTTG GGTAAAACCA TTTTGTTCAG 

501 GAAACGCCCC TTGACCGTCA TCGGCGTGAT GAAAAAAGAC GAAAACGCTT 

551 TCGGCAATTC CGACGTGCTG ATGCTTTGGT CGCCCTATAC GACGGTGATG 

601 CACCAAATCA CAGGCGAGAG CCACACCAAC TCCATCACCG TCAAAATCAA 

651 AGACAATGCC AATACCCGGG TTGCCGAAAA AGGGCTGGCC GAGCTGCTCA 

701 AAGCACGGCA CGGCACGGAA GACTTCTTTA TGAACAACAG CGACAGCATC 

751 AGGCAGATGG TCGAAAGCAC CACCGGTACG ATGAAGCTGC TGATTTCCTC 

801 CATCGCCCTG ATTTCATTGG TAGTCGGCGG CATCGGTGTG ATGAACATTA 

851 TGCTGGTGTC CGTTACCGAG CGCACCAAAG AAATCGGCAT ACGGATGGCA 

901 ATCGGCGCGC GGCGCGGCAA TATTTTGCAG CAGTTTTTGA TTGAGGCGGT 

951 GTTAATCTGC ATCATCGGAG GCTTGGTCGG CGTAGGTTTG TCCGCCGCCG 

1001 TCAGCCTCGT GTTCAATCAT TTTGTAACCG ATTTCCCGAT GGACATTTCG 

1051 GCGGCATCCG TTATCGGGGC GGTCGCCTGT TCGACCGGAA TCGGCATCGC 

1101 GTTCGGCTTT ATGCCTGCCA ATAAGGCAGC CAAACTCAAT CCGATAGATG 

1151 CATTGGCGCA GGATTGA 

This encodes a protein having amino acid sequence <SEQ ID 538>: 



1 MSVQAVLAHK MRSLLTMLGI IIGIASWSV VALGN GSQKK ILEDISSMGT 

51 NTISIFPGRG FGDRRSGKIK TLTIDDAKII AKQSYVASAT PMTSSGGTLT 

101 YRNTDLTASL YGVGEQYFDV RGLKLETGRL FDENDVKEDA QVWIDQNVK 

151 DKLFADSDPL GKTILFRKRP LTVIGVMKKD ENAFGNSDVL MLWSPYTTVM 

201 HQITGESHTN SITVKIKDNA NTRVAEKGLA ELLKARHGTE DFFMNNS DS I 

251 RQMVESTTGT MKL LISSIAL ISLWGGIGV MNIMLVSVTE RTKEIGIRMA 

301 IGARRGNILQ Q FLIEAVLIC IIGGLVGV GL SAAVSLVFNH FVTDFPMDIS 
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10 



15 



20 



25 



351 AAS VIGAVAC STGIGIAFGF MPANKAAKLN PIDALAQD* 

ORF134ng and ORF134-1 show 97.9% identity in 388 aa overlap: 

orfl34ng MSVQAVLAHKMRSLLTMLGI I IGIAS WS WALGNGSQKKI LEDISSMGTNTI S I FPGRG 

I | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I \ I I I I I I I I I I : I I I I I I I I I I I I 
orf 134-1 MSVQAVLAHKMRSLLTMLGI I IGIASWSWALGNGSQKKILEDISSIGTNTIS I FPGRG 

orfl34ng FGDRRSGKIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 
I I I I I I I : I 1 I I I I I I I I I I II I I I 1 I I I I I t I I I I I I I I I I I I I I ( I I I I I I t I I I I I I 
orf 134-1 FGDRRSGRIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 

orfl34ng RGLKLETGRLFDENDVKEDAQVWIDQNVKDKLFADSDPLGKTILFRKRPLTVIGVMKKD 
I j I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 134-1 RGLKLETGRLFDENDVKEDAQVWIDQNVKDKLFADSDPLGKTILFRKRPLTVIGVMKKD 

orfl34ng ENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTRVAEKGLAELLKARHGTE 
I I I I I I I I M I i ! I I I I M M I I I I I I I I ! M I I 1 I I I I I I I : I 1 I I I I I I M I I I I I 
orf 134-1 ENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTQVAEKGLTDLLKARHGTE 

orfl34ng DFFMNNSDSIRQMVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 
jllMHI Mlhlll I MM MM I Ml Mill MM I I I I I II I I INN M Mllll 
orf 134-1 DFFMNNSDSIRQIVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 

orfl34ng IGARRGNILQQFLIEAVLICIIGGLVGVGLSAAVSLVFNHFVTDFPMDISAASVIGAVAC 
I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I II I I I I 1 I I I I I I I I I I I I I I i I I 
orf 134-1 IGARRGNILQQFLIEAVLICVIGGLVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVAC 

orfl34ng STGIGIAFGFMPANKAAKLNPIDALAQDX 
I I I I I I I I t I M I I ! I I I I I I 1 I I II 1 I i 
orf 134-1 STGIGIAFGFMPANKAAKLNPIDALAQDX 



30 ORF1 34ng also shows homology to an E. coli ABC transporter: 



35 



sp|P75831|YBJZ_ECOLI HYPOTHETICAL ABC TRANSPORTER ATP-BINDING PROTEIN YBJZ >gi5 
(AE000189) o648; similar to YBBA_HAEIN SW: P45247 [Escherichia coli] Length = 
648 

Score - 297 bits (753), Expect = 6e-80 

Identities = 162/389 (41%), Positives - 230/389 (58%), Gaps - 1/389 (0%) 

MS VQAVLAHKMRSLLTMLXXXXXXXXXXXXXXLGNGSQKKI LE D I S SMGTNT I S I FPGRG 60 
M+ +A+ A+KMR+LLTML +G+ +++ +L DI S+GTNTI ++PG+ 

MAWRALAANKMRTLLTMLGI I IGI AS WS I VWGDAAKQMVLADIRS IGTNT I DVYPGKD 319 

FGDRRSGKIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 120 
FGD + L DD I KQ +VASATP S L Y N D+ AS GV YF+V 





Query: 


1 


40 


Sbjct: 


260 




Query: 


61 




Sbjct: 


320 


45 


Query: 


121 




Sb j ct : 


380 


< 

50 


Query: 


180 




Sbjct: 


440 




Query: 


240 


55 


Sbjct: 


500 




Query: 


300 


60 


Sbjct: 


560 




Query: 


360 




Sbjct: 


620 



G+ 



G F++ + AQVW+D N + +LF +D +G+ IL P VIGV ++ 
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Based on this analysis, including the presence of the leader peptide and transmembrane regions in 
the gonococcal protein, it is prediceted that these proteins from Kmeningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 65 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 539>: 

1 . . GGGACGGGAG CGATGCTGCT GCTGTTTTAC GCGGTAACGA T . CTGCCTTT 

51 GGCCACTGGC GTTACCCTGA GTTACACCTC GTCGATTTTT TTGGCGGTAT 

101 TTTCCTTCCT GATTTTGAAA GAACGGATTT CCGTTTACAC GCAGGCGGTG 

151 CTGCTCCTTG GTTTTGCCGG CGTGGTATTG CTGCTTAATC CCTCGTTCCG 

201 CAGCGGTCAG GAAACGGCGG CACTCGCCGG GCTGGCGGGC GGCGCGATGT 
251 ' CCGGCTGGGC GTATTTGAAA GTGCGCGAAC TGTCTTTGGC GGGCGAACCC 

301 GGCTGGCGCG TCGTGTTTTA CCTTTCCGTG ACAGGTGTGG CGATGTCGTC 

351 GGTTTGGGCG ACGCTGACCG GCTGGCACAC CCTGTCCTTT CCATCGGCAG 

401 TTTATCTGTC GTGCATCGGC GTGTCCGCGC TGATTGCCCA ACTGTCGATG 

451 ACGCGCGCCT ACAAAGTCGG CGACAAATTC ACGGTTGCCT CGCTTTCCTA 

501 TATGACCGTC GTTTTTTCCG CTCTGTCTGC CGCATTTTTT CTGGGCGAAG 

551 AGCTTTTCTG GCAGGAAATA CTCGGTATGT G CAT CAT CAT C£TCAGCGGT 

601 ATTTTGA 

This corresponds to the amino acid sequence <SEQ ID 540; ORF135>: 

1 . . GTGAMLLLFY AVTILPLATG VTLSYTSSIF LAVFSFLILK ERISVYTQAV 
51 LLLGFAGWL LLNPSFRSGQ ETAALAGLAG GAMSGWAYLK VRELSLAGEP 
101 GWRWFYLSV TGVAMSSVWA TLTGWHTLSF PSAVYLSCIG VSALIAQLSM 
151 TRAYKVGDKF TVASLSYMTV VFSALSAAFF LGEELFWQEI LGMCIIISAV 
201 F* 

Further work revealed the complete nucleotide sequence <SEQ ID 54 1>: 



1 ATGGATACCG CAAAAAAAGA CATTTTAGGA TCGGGCTGGA TGCTGGTGGC 

51 GGCGGCCTGC TTTACCATTA TGAACGTATT GATTAAAGAG GCATCGGCAA 

101 AATTTGCCCT CGGCAGCGGC GAATTGGTCT TTTGGCGCAT GCTGTTTTCA 

151 ACCGTTGCGC TCGGGGCTGC CGCCGTATTG CGTCGGGACA mCTTCCGCAC 

201 GCCCCATTGG AAAAACCACT TAAACCGCAG TATGGTCGGG ACGGGGGCGA 

251 TGCTGCTGCT GTTTTACGCG GTAACGCATC TGCCTTTGGC CACTGGCGTT 

301 ACCCTGAGTT ACACCTCGTC GATTTTTTTG GCGGTATTTT CCTTCCTGAT 

351 TTTGAAAGAA CGGATTTCCG TTTACACGCA GGCGGTGCTG CTCCTTGGTT 

401 TTGCCGGCGT GGTATTGCTG CTTAATCCCT CGTTCCGCAG CGGTCAGGAA 

451 ACGGCGGCAC TCGCCGGGCT GGCGGGCGGC GCGATGTCCG GCTGGGCGTA 

501 TTTGAAAGTG CGCGAACTGT CTTTGGCGGG CGAACCCGGC TGGCGCGTCG 

551 TGTTTTACCT TTCCGTGACA GGTGTGGCGA TGTCGTCGGT TTGGGCGACG 

601 CTGACCGGCT GGCACACCCT GTCCTTTCCA TCGGCAGTTT ATCTGTCGTG 

651 CATCGGCGTG TCCGCGCTGA TTGCCCAACT GTCGATGACG CGCGCCTACA 

701 AAGTCGGCGA CAAATTCACG GTTGCCTCGC TTTCCTATAT GACCGTCGTT 

751 TTTTCCGCTC TGTCTGCCGC ATTTTTTCTG GGCGAAGAGC TTTTCTGGCA 

801 GGAAATACTC GGTATGTGCA TCATCATCCT CAGCGGTATT TTGAGCAGCA 

851 TCCGCCCCAC TGCCTTCAAA CAGCGGCTGC AATCCCTGTT CCGCCAAAGA 

901 TAA 

This corresponds to the amino acid sequence <SEQ ID 542; ORF135-l>: 



1 MDTAKKDILG SGWMLVAAA C FTIMNVLIKE ASAKFALGSG ELVFWRMLFS 

51 TVALGAAAVL RRDXFRTPHW KNHLNRS MVG TGAMLLLFYA VTHL PLATGV 

101 T LSYTSSIFL AVFSFLIL KE RISVYTQA VL LLGFAGWLL LNPSF RSGQE 

151 TAALAGLAGG AMSGWAYLKV RELSLAGEPG WRWFYLSVT GVAMSSVWAT 

201 LTGWHTLS FP SAVYLSCIGV SALIA QLSMT RAYKVGDKFT VAS LSYMTW 

251 FSALSAAFFL GEELFWQ EIL GMCIIILSGI LSSI RPTAFK QRLQSLFRQR 

301 * 

Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted ORF from N. meningitidis (strain A) 

ORF135 shows 99.0% identity over a 197aa overlap with an ORF (ORF135a) from strain A of N. 
meningitidis: 

10 20 30 

orf 135 . pep GTGAMLLLFYAVTILPLATGVTLSYTSSIF 

I I I I I I I I I I I I I I 1 I I I I I I I I I I I I 1 I 
orf 135a STVALGAAAVLRRDTFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLSYTSSIF 
50 60 70 80 90 100 

40 50 60 70 80 90 

orf 135 . pep LAVFSFLILKERISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLK 
I I I I I I 1 I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
or f 1 3 5a LAVFS FLI LKERISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLK 

110 120 130 140 150 160 



100 110 120 130 140 150 

orf 135 . pep VRE L S LAGE PGWR WFYL S VTG VAMS S VW ATLTGWHTL S FP S AV YLS C I GV SAL I AQL SM 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 135a VRE L S LAGE PG WR W FYLS VTGVAMS S VWAT LTG WHT L S FP S AVYL S C I GV SAL I AQL SM 

170 180 190 200 210 220 



160 170 180 190 200 

orf 135 . pep TRAYKVGDKFTVASLSYMTWFSALSAAFFLGEELFWQEILGMCIIISAVFX 

[ I I I M II I I I I I I M I I 1 I I I I I I I I I I I I : ! I I I I I I I I I I I I I I 
orf 135a TRAYKVGDKFTVASLSYMTWFSALSAAFFLAEELFWQEILGMCIIILSGILSSIRPTAF 
230 240 250 260 270 280 



orf 135a KQRLQSLFRQRX 
290 300 

The complete length ORF135a nucleotide sequence <SEQ ID 543> is: 

1 ATGGATACCG CAAAAAAAGA CATTTTAGGA TCGGGCTGGA TGCTGGTGGC 

51 GGCGGCCTGC TTTACCATTA TGAACGTATT GATTAAAGAG GCATCGGCAA 

101 AATTTGCCCT CGGCAGCGGC GAATTGGTCT TTTGGCGCAT GCTGTTTTCA 

151 ACCGTTGCGC TCGGGGCTGC CGCCGTATTG CGTCGGGACA CCTTCCGCAC 

201 GCCCCATTGG AAAAACCACT TAAACCGCAG TATGGTCGGG ACGGGGGCGA 

251 TGCTGCTGCT GTTTTACGCG GTAACGCATC TGCCTTTGGC CACCGGCGTT 

301 ACCCTGAGTT ACACCTCGTC GATTTTTTTG GCGGTATTTT CCTTCCTGAT 

351 TTTGAAAGAA CGGATTTCCG TTTACACGCA GGCGGTGCTG CTCCTTGGTT 

401 TTGCCGGCGT GGTATTGCTG CTTAATCCCT CGTTCCGCAG CGGTCAGGAA 

451 ACGGCGGCAC TCGCCGGGCT GGCGGGCGGC GCGATGTCCG GCTGGGCGTA 

501 TTTGAAAGTG CGCGAACTGT CTTTGGCGGG CGAACCCGGC TGGCGCGTCG 

551 TGTTTTACCT TTCCGTGACA GGTGTGGCGA TGTCATCGGT TTGGGCGACG 

601 CTGACCGGCT GGCACACCCT GTCCTTTCCA TCGGCAGTTT ATCTGTCGTG 

651 CATCGGCGTG TCCGCGCTGA TTGCCCAACT GTCGATGACG CGCGCCTACA 

701 AAGTCGGCGA CAAATTCACG GTTGCCTCGC TTTCCTATAT GACCGTCGTT 

751 TTTTCCGCTC TGTCTGCCGC ATTTTTTCTG GCCGAAGAGC TTTTCTGGCA 

801 GGAAATACTC GGTATGTGCA TCATCATCCT CAGCGGTATT TTGAGCAGCA 

851 TCCGCCCCAC TGCCTTCAAA CAGCGGCTGC AATCCCTGTT CCGCCAAAGA 

901 TAA 

This encodes a protein having amino acid sequence <SEQ ID 544>: 



1 MDTAKKDILG SGWMLVAAA C FTIMNVLIKE ASAKFALGSG ELVFWRMLFS 

51 TVALGAAAVL RRDTFRTPHW KNHLNRS MVG TGAMLLLFYA VTHL PLATGV 

101 T LSYTSSIFL AVFSFLIL KE RISVYTQ AVL LLGFAGWLL LNPSF RSGQE 

151 TAALAGLAGG AMSGWAYLKV RELSLAGEPG WRWFYLSVT GVAMSSVWAT 

201 LTGWHTLS FP SAVYLSCIGV SALIA QLSMT RAYKVGDKFT VAS LSYMTW 

251 FSALSAAFFL AEELFWQ EIL GMCIIILSGI LSSI RPTAFK QRLQSLFRQR 

301 * 

ORF135a and ORF135-1 show 99.3% identity in 300 aa overlap: 



orf 135a . pep MDTAKKDILG SGWMLVAAAC FT IMNVLIKEASAKFALGSGELVFWRMLFSTVALGAAAVL 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I ! I I I I I I I I I 
orf 135-1 MDTAKKDILGSGWMLVAAACFTIMNVLIKEASAKFALGSGELVFWRMLFSTVALGAAAVL 
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orf 135a . pep RRDTFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLSYTSSIFLAVFSFLILKE 
I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I ! I I I I I I I I I I I I I I I I I 
orfl35-l RRDXFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLSYTSSIFLAVFSFLILKE 

orf 135a . pep RISVYTQAVLLLGFAGVVLLLNPSFRSGQETAALAGLAGGAMSGWAYLKVRELSLAGEP^ 
I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
orf 135-1 RISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLKVRELSLAGEPG 



orf 135a. pep WRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSMTRAYKVGDKFT 
I I II I i I I I 11 ! I I I I I I I I I I i I M I I I 1 I I I I I I I I I M 1! 1 I I M I I I I I I I I I I M 
orf 135-1 WRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSMTRAYKVGDKFT 

orf 135a. pep VASLSYMTWFSALSAAFFLAEELFWQEILGMCIIILSGILSSIRPTAFKQRLQSLFRQR 
I I I I I I I I I I I II I I I I I 1 I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I 1 I I I I 
orf 135-1 VASLSYMTWFSALSAAFFLGEELFWQEILGMCIIILSGILSSIRPTAFKQRLQSLFRQR 

Homology with a predicted ORF from N. gonorrhoeae 

ORF135 shows 97% identity over a 201aa overlap with a predicted ORF (ORF135ng) from 



N. gonorrhoeae: 

orfl35.pep 
orfl35ng 
orf 135. pep 
orf 135ng 
orf 135 .pep 
orf!35ng 
orfl35.pep 
orf 135ng 

An ORF135ng nucleotide sequence <SEQ ID 545> was predicted to encode a protein having amino 
acid sequence <SEQ ID 546>: 



GTGAMLLLFYAVTXLPLATGVTLS YTS S IF 30 

I I 1 I I I I I I I I I I I I I : I I I I I I I I I I I I 
STVTLGAAAVLRRDTFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLTTGVTLSYTSSIF 335 

LAVFS FL I LKERI S VYTQAVLLLG FAGWLLLN PS FRSGQETAALAGLAGGAMSGWAYLK 9 0 

I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I 1 I I I I I I I I I II I I I I I I I I 

LAV FSFL I LKER IS VYTQAVLLLG FAG WLLLNPSFRSGQEPAALAGLAGGAMSGWAYLK 395 

VRELSLAGEPGWRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSM 150 

I I I I I I I I I I I I I I I I I II : I II I I II I I I I I II I I I I I I I I I I I I I III MINIMI 
VRELSLAGEPGWRWFYLSATGVAMSSVWATLTGWHTLSFPSAVYLSGIGVSALIAQLSM 455 

TRAYKVGDKFTVASLSYMTWFSALSAAFFLGEELFWQEILGMCIIISAVF 201 
I I I II II I II II I II II I I I I II I M I I II II I II I I I II I II I II I I I : I 
TRAYKVGDKFTVASLSYMTWFSALSAAFFLGEELFWQEILGMCIIISAAF 506 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



MPSEKAFRRH LRTASFQGLH LHHFHQKVGK CGIIGFGIHI FPTLLPA AQG 
ILDIQLGLFR IDFAALAVYR RTQVDFIHTV IDGIASDQAF SEWQILRRL 
NLGHFTDTHL IAQARRFIAD FGNIRPMRRG EAKTFCRCFR FDGIDGIHGD 
FRQCGHINRL APGKDCRNGK RDKVFFHTRH YNQVCLEKTN CSARKIKFRH 
QKQAKTHSTS LAARFTIRPS LSQRPFMDTA KKDILGS GWM LVAAACFTVM 
NVLI KEASAK FALGSGELVF WRMLFSTVTL GAAAVLRRDT FRTPHWKNHL 
NRS MVGTGAM LLLFYAVTHL PLTTGVT LSY TSSIFLAVFS FLI L KERI SV 

JTRSGQEPAAL AGLAGGAMSG WAYLKVRELS 
FYLSATGVAM SSVWATLTGW HTLS FPSAVY LSGIGVSALI 
VGDKFTVAS L SYMTWFSAL SAAFFL GEE L FWQEILGMCI 



YTQAVLLLGF AGWLLLNPS 



LAGEPGWRW 
AQLSMTRAYK 
IISAAF* 



Further work revealed the following gonococcal sequence <SEQ ID 547>: 



1 ATGGATACCG CAAAAAAAGA CATTTTAGGA TCGGGCTGGA TGCTGGTGGC 

51 GGCGGCCTGC TTCACCGTTA TGAACGTATT GATTAAAGAG GCATCGGCAA 

101 AATTTGCCCT CGGCAGCGGC GAATTGGTCT TTTGGCGCAT GCTGTTTTCA 

151 ACCGTTACGC TCGGTGCTGC CGCCGTATTG CGGCGCGACA CCTTCCGCAC 

201 GCCCCATTGG AAAAACCACT TAAACCGCAG TATGGTCGGG ACGGGGGCGA 

251 TGCTGCTGCT GTTTTACGCG GTAACGCATC TGCCTTTGAC AACCGGCGTT 

301 ACCCTGAGTT ACACCTCGTC GATTTTTttg GCGGTATTTT CCTTCCTGAT 

351 TTTGAAAGAA CGGATTTCCG TTTACACGCA GGCGGTGCTG CTCCTTGGTT 

401 TTGCCGGCGT GGTATTGCTG CTTAATCCCT CGTTCCGCAG CGGTCAGGAA 

451 CCGGCGGCAC TCGCCGGGCT GGCGGGCGGC GCGATGTCCG GCTGGGCGTA 

501 TTTGAAAGTG CGCGAACTGT CTTTGGCGGG CGAACCCGGC TGGCGCGTCG 

551 TGTTTTACCT TTCCGCAACC GGCGTGGCGA TGTCGTCggt ttgggcgacg 

601 Ctgaccggct ggCACAcccT GTCCTTTcca tcggcagttt ATCtgtCGGG 
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651 CATCGGCGTG tccgcgCtgA TTGCCCAaCT GtcgatgAcg cGCGcctaca 

701 aaGTCGGCGA CAAATTCACG GTTGCCTCGC tttcctaTAt gaccgtcGTC 

751 TTTTCCGCCC TGTCTGCCGC ATTTTTTCTg ggcgaagagc tttTCtggCA 

801 GGAAATACTC GGTATGTGCA TCATTAtccT CAGCGGCATT TTGAGCAGCA 

851 TCCGCCCCAT TGCCTTCAAA CAGCGGCTGC AAGCCCTCTT CCGCCAAAGA 

901 TAA 

This corresponds to the amino acid sequence <SEQ ID 548; ORF135ng-l>: 

1 MDTAKKDILG SGWMLVAAA C FTVMNVLIKE ASAKFALGSG ELVFWRMLFS 

51 TVTLGAAAVL RRDTFRTPHW KNHLNRS MVG TGAMLLLFYA VTHL PLTTGV 

101 T LSYTSSIFL AVFSFLIL KE RISVYTQ AVL LLGFAGWLL LNPSF RSGQE 

151 PAALAGLAGG AMSGWAYLKV RELSLAGEPG WRWFYLSAT GVAMSSVWAT 

201 LTGWHTLS FP SAVYLSGIGV SALIA QLSMT RAYKVGDKFT VAS LSYMTW 

251 FSALSAAFFL GEELFWQ EIL GMCIIILSGI LSSI RPIAFK QRLQALFRQR 

301 * 

ORF135ng-l and ORF135-1 show 97.0% identity in 300 aa overlap: 

orf 135ng-l . pep MDTAKKDILG SGWMLVAAAC FT VMNVL I KEASAKFALGSGELVFWRMLFS TVTLGAAAVL 
I I I IN II I! I M I! MUM 1:1 I II I II I I j Ml I I M III I II I I Ml !:MMMI 
orf 135-1 MDTAKKDILGSGWMLVAAACFTIMNVLIKEASAKFALGSGELVFWRMLFSTVALGAAAVL 

orf 135ng-l . pep RRDTFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLTTGVTLSYTSSIFLAVFSFLILKE 

I M : M M M M I M M M I M II M II M li M M : I M M I I M M M M M M I M I 
orf 135-1 RRDXFRT PHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLSYTS S I FLAVFS FLI LKE 

orf 135ng-l . pep RISVYTQAVLLLGFAGWLLLNPSFRSGQEPAALAGLAGGAMSGWAYLKVRELSLAGEPG 

II I I I I I I I I I I I I I I I I I II I I I I I II I I I I II I I I I I I I I I I I I i I I I I I I I I I I M 
orf 135-1 RISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLKVRELSLAGEPG 

orf 135ng-l . pep WRWFYLSATGVAMSSVWATLTGWHTLSFPSAVYLSGIGVSALIAQLSMTRAYKVGDKFT 

I II I M I I : M I II I I I I I I I I I I I I I I I M I I I I I I I I I I I I I II I I I I I I I I I I II I 
orfl35-l WRWFYL S VTGVAMS SVWATLTGWHTLS FPS AVYLSC IGVSALI AQLSMTRAYKVGDKFT 

orfl35ng-l.pep VAS LSYMTWFSALS AAFFLGEELFWQE ILGMCI 1 1 LSGI LS S IRPI AFKQRLQALFRQR 

II I M Ml II Mill M II Ml II MM I I III II M I I Ml II II I I I I I I I : I I I I I 
orf 135-1 VASLSYMTWFSALSAAFFLGEELFWQEILGMCIIILSGILSSIRPTAFKQRLQSLFRQR 

Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 66 



The following DNA sequence was identified in N. meningitidis <SEQ ID 549>: 

1 ATGAAGCGGC GTATAGCCGT CTTCGTCCTG TTCCCGCAGA TAATCCGAGT 

51 TTTGGGACAA CTGTTGCCGA AAATCGTCAA TACAGTTCCG GCACATCGGA 

101 TGCTCTTCCA GATTTTCGGG ATGTTCTTTT TCTTCATACA CCAGCAATAT 

151 CTGCCCGGGA TCGCCGAAAT CGATTCCCCA TGCGGCATCG TGTTCGGTGC 

201 GCTCCTCTTC CGTCATCTGC CCGCGCATTG CCTGTATGGT AAAGCCGCCG 

251 TAGGGGATGC CgTTGCACAC GAACATCCAG TCGCTGATGT CGTCAACCGG 

301 AACGCAAACG cTTTCGCCTT GTTCGACATT GGTCAGTTCG CCsGGTTCAT 

351 TGTTCAGCAC ACCGTAAATA TAAAGACCGT CAAAATAAAT ATCGTCGATC 

401 CACATATGTT CGCAAATTTC GCCGTCTTCG CCGTCTTGGA AAAAAGGGAC 

451 TTTGACCATG GCAAAATCCA AGGCGGAAAT AATGCGGCGG CGTTCCCAAA 

501 AAAGcTCGCG CCAAAAATAT TTGAATGTTT TACGGGCGCG TTCGTCGGCA 

551 CGGTTTACCG GTTCGTCTGC CTGTTCTACA TAATAAATGA CGGAATCGCC 

601 CATCAT£TCT GCTCCTCAAC GTGTACGGTA TCTGTTTGCA CCTTACTGCG 

651 GCTTTCTgcC kTCGGCATCC GATTCGGATT TGAAAAGTTC mmrwyATTCG 

701 GAATAG 

This corresponds to the amino acid sequence <SEQ ID 550; ORF136>: 

1 MKRRIAVFVL FPQIIRVLGQ LLPKIVNTVP AHRMLFQIFG MFFFFIHQQY 

51 LPGIAEIDSP CGIVFGALLF RHLPAHCLYG KAAVGDAVAH EHPVADWNR 
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101 NANAFALFDI GQFAXFIVQH TVNIKTVKIN IVDPHMFANF AVFAVLEKRD 
151 FDHGKIQGGN NAAAFPKKLA PKIFECFTGA FVGTVYRFVC LFYIINDGIA 
201 HHSAPQRVRY LFAPYCGFLP SASDSDLKSS XXSE* 

Further work revealed the complete nucleotide sequence <SEQ ID 55 1>: 



1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGTTCCCGC AGATAATCCG 

51 AGTTTTGGGA CAACTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 

101 GGATGCTCTT CCAGATTTTC GGGATGTTCT TTTTCTTCAT ACACCAGCAA 

151 TATCTGCCCG GGATCGCCGA AATCGATTCC CCATGCGGCA TCGTGTTCGG 

201 TGCGCTCCTC TTCCGTCATC TGCCCGCGCA TTGCCTGTAT GGTAAAGCCG 

251 CCGTAGGGGA TGCCGTTGCA CACGAACATC CAGTCGCTGA TGTCGTCAAC 

301 CGGAACGCAA ACGCTTTCGC CTTGTTCGAC ATTGGTCAGT TCGCCGGGTT 

351 CATTGTTCAG CACACCGTAA ATATAAAGAC CGTCAAAATA AATATCGTCG 

401 ATCCACATAT GTTCGCAAAT TTCGCCGTCT TCGCCGTCTT GGAAAAAAGG 

451 GACTTTGACC ATGGCAAAAT CCAAGGCGGA AATAATGCGG CGGCGTTCCC 

501 AAAAAAGCTC GCGCCAAAAA TATTTGAATG TTTTACGGGC GCGTTCGTCG 

551 GCACGGTTTA CCGGTTCGTC TGCCTGTTCT ACATAATAAA TGACGGAATC 

601 GCCCATCATT CTGCTCCTCA ACGTGTACGG TATCTGTTTG CACCTTACTG 

651 CGGCTTTCTG CCTTCGGCAT CCGATTCGGA TTTGAAAAGT TCCAAATATT 

701 CGGAATAG 

This corresponds to the amino acid sequence <SEQ ID 552; ORF136-l>: 



1 MMKR RIAVFV LFPQIIRVLG QL LPKIVNTV PAHRMLFQIF GMFFFFIHQQ 

51 YLPGIAEIDS PCGIVFGALL FRHLPAHCLY GKAAVGDAVA HEHPVADWN 

101 RNANAFALFD IGQFAGFIVQ HTVNIKTVKI NIVDPHMFAN FAVFAVLEKR 

151 DFDHGKIQGG NNAAAFPKKL APKIFECFT G AFVGTVYRFV CLFYII NDGI 

201 AHHSAPQRVR YLFAPYCGFL PSASDSDLKS SKYSE* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF136 shows 71.7% identity over a 237aa overlap with an ORF (ORF136a) from strain A ofM 



meningitidis: 

10 20 30 40 * 50 59 

orfl36.pep MKRRIAVFVLFPQIIRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 
I I I I I I I I I I : I I I : I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl36a MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQXFGMFFFFIHQQYLPGIAEIDS 

10 20 30 40 50 60 



60 70 80 90 100 110 119 

orf 136 . pep PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADWNRNANAFALFDIGQFAXFIVQ 
I I I I I I I : I I I II : ! I I I I I I I I I : I I I I I I I I I I I I II I I I I I I I I I I I I I I MM 
orf 136a PCGIVFGTLLFRHXSTHCLYGKAAVGNAVAHEHPVADWNRNANAFALFDIGQFAGFIVQ 

70 80 90 100 110 120 



120 130 140 150 160 170 179 

orf 136. pep HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 
I :: I : II I II I II I II I II I I I I I II I II : : I : I : I : : : : 

orf 136a HAINVKTVKINIVDPHMFANFAXFAVLEKRALTMAKSKXXXMRRRSQKSSRQKYLNVLRA 

130 140 150 160 170 180 



180 190 200 210 220 230 

orf 136. pep AFVGTVYRFVCLFYIINDGIAHH SAPQRVRYLFAPYCGFLPSASDSDLKSSXXSEX 

: 11:1 : : : : I I II I I M II II II II I II I I I I I I I II ill 

orf 136a R SPARFTGLSACSTXXMTESPIISAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 

190 200 210 220 230 

The complete length ORF 136a nucleotide sequence <SEQ ID 553> is: 



1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGCTCATGC AGAAAATCCG 

51 GATTTTGGGA CAACTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 

101 GGATGCTCTT CCAGATNTTC GGGATGTTCT TTTTCTTCAT ACACCAGCAA 

151 TACCTGCCCG GGATCGCCGA AATCGATTCC CCATGCGGCA TCGTGTTCGG 

201 TACGCTCCTC TTCCGTCATC NGTCCACGCA TTGCCTGTAT GGTAAAGCCG 

251 CCGTAGGGAA TGCCGTTGCA CACGAACATC CAGTCGCTGA TGTCGTCAAC 
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301 
351 
401 
451 
501 
551 
601 
651 
701 



CGGAACGCAA 
CATTGTTCAG 
ATCCACATAT 
GCTTTGACCA 
AAAAAGCTCG 
CACGGTTTAC 
CCCATCATAT 
CGGCTTTCTG 
CGGAATAG 



ACGCTTTCGC 
CACGCCATAA 
GTTCGCAAAT 
TGGCAAAATC 
CGCCAAAAAT 
CGGTTTGTCT 
CTGCTCCTCA 
CCTTCGGCAT 



CTTGTTCGAC 
ATGTAAAGAC 
TTCGCCNTCT 
TAAGGNGNNA 
ATTTGAATGT 
GCCTGTTCTA 
ACGTGTACGG 
CCGATTCGGA 



ATTGGTCAGT 
CGTCAAAATA 
TCGCCGTCTT 
NNGATGCGGC 
TTTGCGGGCG 
CATAATAAAT 
TATCTGTTTG 
TTTGAAAAGT 



TCGCCGGGTT 
AATATCGTCG 
GGAAAAAAGG 
GGCGTTCCCA 
CGTTCGCCGG 
GACGGAATCG 
CACCTTACTG 
TCCAAATATT 



10 This encodes a protein having amino acid sequence <SEQ ID 554>: 

1 MMKR RIAVFV LLMQKIRILG QL LPKIVNTV PAHRMLFQXF GMFFFFIHQQ 

51 YLPGIAEIDS PCGIVFGTLL FRHXSTHCLY GKAAVGNAVA HEHPVADWN 

101 RNANAFALFD IGQFAGFIVQ HAINVKTVKI NIVDPHMFAN FAXFAVLEKR 

151 ALTMAKSKXX XMRRRSQKSS RQKYLNVLRA RSPARFTGLS ACST**MTES 

201 PIISAPQRVR YLFAPYCGFL PSASDSDLKS SKYSE* 



15 



ORF136a and ORF136-1 show 73.1% identity in 238 aa overlap: 



20 



10 20 30 40 50 60 

orf 136a . pep MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQXFGMFFFFIHQQYLPGIAEIDS 
I I I I I I I I I I I : I I I : I I I I I I I I I I I I I I I t I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 136-1 MMKRRIAVFVLFPQIIRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 

10 20 30 40 50 60 



25 



70 80 90 100 110 120 

orf 136a . pep PCGIVFGTLLFRHXSTHCLYGKAAVGNAVAHEHPVADWNRNANAFALFD IGQFAGFIVQ 
I I I I I I I : I I I I I : I I I I I I I I I I : I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I 
orf 136-1 PCG I VFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADVVNRNANAFALFD IGQFAGFIVQ 

70 80 90 100 110 120 



30 



35 



40 



45 



50 



55 



60 



130 140 150 160 170 180 

or f 136a . pep HA INVKT VK IN I VD PHM FAN FAX FAVLEKRALTMAKS KXXXMRRR S QK S S RQK Y LN VLRA 
I :: I : I I I I I I i I I I I I I I I I I I I I I I I I : : I : I : I : : : : 

orf 136-1 HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 

130 140 150 160 170 180 

190 200 210 220 230 

orf 136a. pep R SPARFTGLSACSTXXMTESPIISAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 

: 11:1 : : : : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 136-1 AFVGTVYRFVCLFYIINDGIAHH SAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 

190 200 210 220 230 

Homology with a predicted ORF from N. gonorrhoeae 

ORF136 shows 92.3% identity over a 234aa overlap with a predicted ORF (ORF136ng) from 
N .gonorrhoeae: 

orf 13 6. pep 
orf 136ng 
orf 136. pep 
orf 136ng 
orf 136. pep 
orfl36ng 
orf 136. pep 
orfl36ng 

The complete length ORF136ng nucleotide sequence <SEQ ID 555> is: 

1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGCTCATGC AGAAAATCCG 
51 GATTTTGGGA CAACTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 



MKRRIAVFVLFPQIIRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 
I I I I t 1 I 1 I I r | II : I I I i I I I II !! M I I I I I I I I I I I I II I I !: 1 ! II I i I I I I I 
MMKRRI AVFVLLMQKI RI LGQLLPKI VNTVPAHRMLFQI FGMFFFFIHRQYLPG I AE I DS 



59 



60 



119 



PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADWNRNANAFALFDIGQFAXFIVQ 
I 11111:111111 I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I M I I I I I I 
PGGIVFGTLLFRHLSAHCLYGKAAVGDAVAHEHPVADVANRNANAFALFDIGQSAGFIVQ 120 



HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 
II II Mill II III II I Mill II I II I! II MM III II I I I I II I I I M I 1:11 II I I 
HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKVFECFTG 

AFVGTVYRFVCLFYIINDGIAHHSAPQRVRYLFAPYCGFLPSASDSDLKSSXXSE 234 
I M I I I I I I II II I I I I I I I I I I : I I I I II I I M II I I I I I I I I I I I I I II 
AFAGTVYRFVCLFYIINDGIAHHTAPQRVRYLFAPYRGFLPPASDSDLKSSKYSE 235 



179 



180 
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101 GGATGCTCTT CCAAATTTTC GGGATGTTCT TTTTCTTCAT ACACCGGCAA 

151 TACCTGCCCG GGATCGCCGA AATCGATTCC CCAGGCGGTA TCGTGTTCGG 

201 TACGCTCCTC TTCCGTCATC TGTCCGCGCA TTGCCTGTAC GGTAAAGCCG 

251 CCGTAGGGGA TGCCGTTGCA CACGAACATC CAGTCGCTGA TGTCGCCAAC 

301 CGGAACGCAA ACGCTTTCGC CTTGTTCGAC ATTGGTCAGT CCGCCGGGTT 

351 CATTGTTCAG CACACCGTAA ATATAAAGAC CGTCAAAATA AATATCGTCG 

401 ATCCACATAT GTTCGCAAAT TTCGCCGTCT TCGCCGTCTT GGAAAAAAGG 

451 GACTTTGACC ATGGCAAAAT CCAAGGCGGA AATAATGCGG CGGCGTTCCC 

501 AAAAAAGCTC GCGCCAAAAG TATTTGAATG TTTTACGGGC GCGTTCGCCG 

551 GCACGGTTTA CCGGTTCGTC TGCCTGTTCT ACATAATAAA TGACGGAATC 

601 GCCCATCATA CTGCTCCTCA ACGTGTACGG TATCTGTTTG CACCTTACCG 

651 CGGTTTTCTA CCTCCGGCAT CCGATTCGGA TTTGAAAAGT TCCAAATATT 

701 CGGAATAG 

This encodes a protein having amino acid sequence <SEQ ID 556>: 

1 MMKRR IAVFV LLMQKIRILG QL LPKIVNTV PAHRMLFQIF GMFFFFIHRQ 

51 YLPGIAEIDS PGGIVFGTLL FRHLSAHCLY GKAAVGDAVA HEHPVADVAN 

101 RNANAFALFD IGQSAGFIVQ HTVNIKTVKI NIVDPHMFAN FAVFAVLEKR 

151 DFDHGKIQGG NNAAAFPKKL APKVFECFT G AFAGTVYRFV CLFYII NDGI 

201 AHHTAPQRVR YLFAPYRGFL PPASDSDLKS SKYSE* 

ORF136ng and ORF136-1 show 93.6% identity in 235 aa overlap: 

orfl36ng MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQIFGMFFFFIHRQYLPGIAEIDS 
I I I I I I I II I I : I I I : I I I I I I I I I I I I I I II I I I I I I I I I I I I I I : I I I I I I I II I I 
orf 136-1 MMKRRIAVFVLFPQIIRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 

orfl36ng PGGIVFGTLLFRHLSAHCLYGKAAVGDAVAHEHPVADVANRNANAFALFDIGQSAGFIVQ 
I I I I I I : I II I II I I I I I I I I I I I I I I I I I I I I I I I : I I I i I I I I I I I I I I I I I I I I 
orf 136-1 PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADWNRNANAFALFDIGQFAGFIVQ 

orf!36ng HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKVFECFTG 
I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I 
orf 136-1 HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 

orfl36ng AFAGTVYRFVCLFYIINDGIAHHTAPQRVRYLFAPYRGFLPPASDSDLKSSKYSEX 
I I : I I I II I I I I I I I I I I I I I I I : I I I I I II I I I II I I I I I I I I I I I I I I I I I I 
orf 136-1 AFVGTVYRFVCLFYIINDGIAHHSAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 

Based on the presence of the putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 67 

The following partial DNA sequence was identified in N.meningitidis <SEQ ED 557>: 

1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGACCGCTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CC.TGCGGAC GGCGGGAAAT AATGCTGTCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGG TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT GTAGGTATTA TTAAGGTTTT 

201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACC TCCGCAGGTT 

251 CGATTGTCGG CAACCTTTTT GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 TTGGAAGCCG AAATTTTAGG CAAAACCGAT TTGGTCGATT TAACCTTGTC 

351 CACCAATGGG TTTATCAAAG GCGCAAAGCT GCAAAATTAC ATCAACCGAA 

401 AACTCCGCGG CATGCAGATT CAGCAGTTTC CCATCAAATT TGCCGCC. . 

This corresponds to the amino acid sequence <SEQ ID 558; ORF137>: 

1 MENMVTFSKI RPLLAIAAAA LLAAXRTAGN NAVRKPVQTA KPAAWGLAL 

51 GGGASKGFAH VGIIKVLKEN GIPVKWTGT SAGSIVGNLF ASGMSPDRLE 

101 LEAEILGKTD LVDLTLSTNG FIKGAKLQNY INRKLRGMQI QQFPIKFAA. . 



Further work revealed the complete nucleotide sequence <SEQ ID 559>: 
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1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGACCGCTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CCTGCGGCAC GGCGGGAAAT AATGCTGTCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGG TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT GTAGGTATTA TTAAGGTTTT 

201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACA TCGGCAGGTT 

251 CGATTGTCGG CAGCCTTTTT GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 TTGGAAGCCG AAATTTTAGG CAAAACCGAT TTGGTCGATT TAACCTTGTC 

351 CACCAGTGGT TTTATCAAAG GCGAAAAGCT GCAAAATTAC ATCAACCGAA 

401 AAGTCGGCGG CAGGCAGATT CAGCAGTTTC CCATCAAATT TGCCGCCGTT 

451 GCTACTGATT TTGAAACCGG CAAGGCCGTC GCTTTCAATC AGGGGAATGC 

501 CGGGCAGGCT GTGCGCGCTT CCGCCGCCAT TCCCAATGTG TTCCAACCCG 

551 TTATCATCGG CAGGCATACA TATGTTGACG GCGGTCTGTC GCAGCCCGTG 

601 CCCGTCAGTG CCGCCCGGCG GCAGGGGGCG AATTTCGTGA TTGCCGTCGA 

651 TATTTCCGCC CGTCCGGGCA AAAACATCAG CCAAGGTTTC TTCTCTTATC 

701 TCGATCAGAC GCTGAACGTA ATGAGCGTTT CTGCGTTGCA AAATGAGTTG 

751 GGGCAGGCGG ATGTGGTTAT CAAACCGCAG GTTTTGGATT TGGGTGCAGT 

801 CGGCGGATTC GATCAGAAAA AACGCGCCAT CCGGTTGGGT GAGGAGGCAG 

851 CACGTGCCGC ATTGCCTGAA ATCAAACGCA AACTGGCGGC ATACCGTTAT 

901 TGA 

This corresponds to the amino acid sequence <SEQ ID 560; ORF137-l>: 

1 MENMVTFSKI RPLLAIAAAA LLAA CGTAGN NAVRKPVQTA KPAAWGLAL 
51 GGGASKGFAH VGIIKVLKEN GIPVKWTGT SAGSIVGSLF ASGMSPDRLE 
101 LEAEILGKTD LVDLTLSTSG FIKGEKLQNY INRKVGGRQI QQFPIKFAAV 
151 ATDFETGKAV AFNQGNAGQA VRASAAIPNV FQPVIIGRHT YVDGGLSQPV 
201 PVSAARRQGA NFVIAVDISA RPGKNISQGF FSYLDQTLNV MSVSALQNEL 
251 GQADWIKPQ VLDLGAVGGF DQKKRAIRLG EEAARAALPE IKRKLAAYRY 
301 * 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF137 shows 93.3% identity over a 149aa overlap with an ORF (ORF137a) from strain A ofN. 
meningitidis: 

10 20 30 40 50 60 

orf 137 . pep MENMVTFSKIRPLLAIAAAALLAAXRTAGNNAVRKPVQTAKPAAWGLALGGGASKGFAH 
I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I : i I I I I I I I I I I I I 1 I I I I I I I I I I I I I 
orf 137a MENMVTFSKIRPLLAIAAAALLAACGTAGNNAARKPVQTAKPAAWGLALGGGASKGFAH 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 137 . pep VGI I KVLKENG I PVKWTGTS AGS I VGNL FASGMS PDRLELE AE I LGKTDLVDLTLSTNG 
I I I I I I I I I I 1 I I I I I I I I i I I 1 I I II : I I I I I I I I I I I I I I II I I I I I I I I I I t I I I : I 
orf 137a VGI I KVLKENG I PVKWTGTS AGS IVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 

70 80 90 100 110 120 

130 140 149 

orf 137 .pep FIKGAKLQNYINRKLRGMQIQQFPIKFAA 
I I I I MINIMI: | : I I II II I i II 
orf 137a FIKGEKLQNY INRKVGGRRIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 

130 140 150 160 170 180 

The complete length ORF137a nucleotide sequence <SEQ ID 561> is: 

1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGACCGCTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CCTGCGGCAC GGCGGGAAAT AATGCTGCCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGG TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT GTAGGTATTA TTAAGGTTTT 

201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACA TCGGCAGGTT 

251 CGATAGTCGG CAGCCTTTTT. GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 TTGGAAGCCG AAATTTTAGG TAAAACCGAT TTGGTCGATT TAACCTTGTC 

351 CACCAGTGGT TTTATCAAAG GCGAAAAGCT GCAAAATTAC ATCAACCGAA 

401 AAGTCGGCGG CAGGCGGATT CAGCAGTTTC CCATCAAATT TGCCGCCGTT 

451 GCTACTGATT TTGAAACCGG CAAGGCCGTC GCTTTCAATC AAGGGAATGC 

501 CGGGCAGGCT GTGCGCGCTT CCGCCGCCAT TCCCAATGTG TTCCAACCCG 

551 TTATCATCGG CAGGCATACA TATGTTGACG GCGGTCTGTC GCAGCCCGTG 



WO 99/24578 



-324- 



PCT/IB98/01665 



601 CCCGTCAGTG 

651 TATTTCCGCC 

701 TCGATCAGAC 

751 GGGCAGGCGG 

801 CGGCGGATTC 

851 CACGTGCCGC 

901 TGA 



CCGCCCGGCG 
CGTCCGAGCA 
GCTGAACGTA 
ATGTGGTTAT 
GATCAGAAAA 
ATTGCCTGAA 



GCANGNNNNG 
AAAACATCAG 
ATGAGCGTTT 
CAAACCGCAG 
AACGCGCCAT 
ATCAAACGCA 



NATNTCGTGA 
CCAAGGCTTC 
CCGCGTTGCA 
GTTTTGGATT 
CCGGTTGGGT 
AACTGGCGGC 



TTGCCGTCGA 
TTCTCTTATC 
AAATGAGTTG 
TGGGTGCAGT 
GAGGAGGCAG 
ATACCGTTAT 



This encodes a protein having amino acid sequence <SEQ ID 562>: 



1 MENMVTFSKI RPLLAIAAAA LLAACGTAGN 



51 GGGASKGFAH 

101 LEAEILGKTD 

151 ATDFETGKAV 

201 PVSAARRXXX 

251 GQADWIKPQ 

301 * 



VGIIKVLKEN 
LVDLTLSTSG 
AFNQGNAGQA 
XXVIAVDISA 
VLDLGAVGGF 



GIPVKWTGT 
FIKGEKLQNY 
VRASAAIPNV 
RPSKNISQGF 
DQKKRAIRLG 



NAARKPVQTA 
SAGSIVGSLF 
INRKVGGRRI 
FQPVIIGRHT 
FSYLDQTLNV 
EEAARAALPE 



KPAAWGLAL 
ASGMSPDRLE 
QQFPIKFAAV 
YVDGGLSQPV 
MSVSALQNEL 
IKRKLAAYRY 



ORF137a and ORF137-1 show 97.3% identity in 300 aa overlap: 



orf 137a . pep MENMVTFSKI RPLLAIAAAALLAACGTAGNNAARKPVQTAKPAAWGLALGGGASKGFAH 
M I I I I I I I I I I I I II I I I II I I M I I I I I I I : I I I I I I I I I I I II I I f I I M I ! I I I I I 
orf 137-1 MENMVTFSKIRPLLAIAAAALLAACGTAGNNAVRKPVQTAKPAAWGLALGGGASKGFAH 

orf 137a. pep VGIIKVLKENGIPVKWTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 

I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 137-1 VGIIKVLKENGIPVKWTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 

orf 137a . pep FIKGEKLQNY INRKVGGRRIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 

I I I I I I I I I I I I I I I I I I : I II I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I ! I I I I I 
orf 137-1 FIKGEKLQNY I NRKVGGRQ I QQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 

orf 137a . pep FQPVIIGRHTYVDGGLSQPVPVSAARRXXXXXVIAVDISARPSKNISQGFFSYLDQTLNV 

II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I 11 I I I I I I I I 
orf 137-1 FQPVIIGRHTYVDGGLSQPVPVSAARRQGANFVIAVDISARPGKNISQGFFSYLDQTLNV 

orf 137a . pep MSVSALQNELGQADWIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPE IKRKLAAYRY 
I I I I I I I I I I M I I M I I I I I I I I I I I I ! I I I I I I I I I I I II I I I I I ! II I I I I I I I I II 
orf 137-1 MSVSALQNELGQADWIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPEIKRKLAAYRY 

Homology with a predicted ORF from N. gonorrhoeae 

ORF137 shows 89.9% identity over a 149aa overlap with a predicted ORF (ORF137ng) from 
N. gonorrhoeae: 



orf 137 .pep 
orf 137ng 
orf 137 .pep 
orfl37ng 
orf 137 .pep 
orf 137ng 



MENMVT FS K I R PLLAI AAAALLAAXRT AGNN AVRK P VQT AK PAAWGLALGGG AS KG FAH 
1 1 I I I I 1 I I t t rlllllllllll M M II: I I I I I I I M I M hi M I I I II I I I i I 
MENMVT FSKIRS FLAIAAAALLAACGTAGNNAARKPVQTAKPAAWALALGGGASKG FAH 



60 



60 



120 



VGIIKVLKENGIPVKWTGTSAGSIVGNLFASGMSPDRLELEAEILGKTDLVDLTLSTNG 
:M:| Ml III II II IMMMI I [ll:|:MIMI I I I I M II III II I I I III I I 1:1 
IGIVKVLKENGI PVKWTGTSAGS IVGSLLASGMS PDRLELEAE I LGKTDLVDLTLSTSG 120 

FIKGAKLQNYINRKLRGMQIQQFPIKFAA 149 
I I I I I I I I I I I I I : I I I I I I I I I I II 

FIKGEKLQNYINRKVGGRQIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 180 



The complete length ORF137ng nucleotide sequence <SEQ ID 563> is: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 



ATGGAAAATA 
CGCCGCCGCG 
GCAAGCCGGT 
GGTGGCGGCG 
GAAAGAAAAC 
CGATAGTCGG 
TTGGAAGCCG 
CACCAGTGGT 
AAGTCGGCGG 
GCCACTGATT 



TGGTAACGTT 
TTGCTTGCCG 
GCAAACCGCC 
CATCTAAAGG 
GGTATTCCTG 
CAGCCTTTTG 
AGATTTTAGG 
TTTATCAAAG 
CAGGCAGATT 
TTGAAACCGG 



TTCAAAAATC 
CCTGCGGTAC 
AAACCCGCCG 
ATTTGCCCAT 
TGAAGGTGGT 
GCATCGGGTA 
TAAAACCGAT 
GCGAAAAGCT 
CAGCAGTTTC 
CAAGGCCGTC 



AGATCATTTT 
GGCGGGAAAC 
CAGTGGTCGC 
ATAGGAATTG 
TACCGGCACA 
TGTCGCCCGA 
TTAGTCGATT 
GCAAAATTAC 
CCATCAAATT 
GCTTTCAATC 



TGGCAATCGC 
AATGCCGCCC 
TTTGGCACTC 
TTAAGGTTTT 
TCGGCAGGTT 
CCGCCTCGAA 
TAACCTTGTC 
ATCAACCGAA 
TGCCGCCGTT 
AAGGGAATGC 
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501 CGGGCAGGCG GTTCGTGCTT CCGCCGCCAT TCCCAATGTG TTCCAGCCAG 

551 TCATCATCGG CAGGCACAAA TATGTTGACG GCGGTCTGTC GCAGCCCGTG 

601 CCCGTCAGTG CCGCTCGGCG GCAGGGGGCG AATTTCGTGA TTGCCGTCGA 

651 TATTTCCGCA CGTCCGAGCA AAAATGTCGG TCAAGGTTTC TTCTCTTATC 

701 TCGATCAGAC GCTGAACGTG ATGAGCGTTT CCGTGTTGCA AAACGAGTTG 

751 gggcAGGCGG ATGTGGTTAT CAAACCGCag gtTTTGGATT TGGGTGCAGT 

801 CGGCGGATTC GATCAGAAAA AGCGCGCCAT CCGGTTGGGC GAGGAGGCAG 

851 CACGTGCCGC ATTGCCTGAA ATCAAACGCA AACTGGCGGC ATACCGTTAT 

901 TGA 

This encodes a protein having amino acid sequence <SEQ ID 564>: 



1 MSNMVTFSK I RSFLAIAAAA LLAAC GTAGN NAARKPVQTA KPAAWALAL 

51 GGGASKGFAH IGIVKVLKEN GIPVKWTGT SAGSIVGSLL ASGMSPDRLE 

101 LEAEILGKTD LVDLTLSTSG FIKGEKLQNY INRKVGGRQI QQFPIKFAAV 

151 ATDFETGKAV AFNQGNAGQA VRASAAIPNV FQPVIIGRHK YVDGGLSQPV 

201 PVSAARRQGA NFVIAVDISA RPSKNVGQGF FSYLDQTLNV MSVSVLQNEL 

251 GQADVVIKPQ VLDLGAVGGF DQKKRAIRLG EEAARAALPE IKRKLAAYRY 

301 * 

ORF137ng and ORF137-1 show 96.0% identity in 300 aa overlap: 



orfl37ng MENMVTFSKIRSFLAIAAAALLAACGTAGNNAARKPVQTAKPAAWALALGGGASKGFAH 
ill INI MM : I I I I I i I I I i I i I I I I I I I : i i M M M M M I : I I I 1 I i I I i I I I I 
orf 137-1 MENMVTFSKIRPLLAIAAAALLAACGTAGNNAVRKPVQTAKPAAWGLALGGGASKGFAH 



orfl37ng I G I VKVLKENG I PVKWTGTS AGS I VGS LLAS GMS P DRLELEAE I LGKTDLVDLTLST SG 

: I I : M M M M M M M M I M I M M I : I I I I I M M M M M M M M M I M M M 
orf 137-1 VGIIKVLKENGIPVKWTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 



orfl37ng FIKGEKLQNYINRKVGGRQIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 
M M M M M M M M M M M M M M M M M M M i M I M M M M I M M M M I 
orf 137-1 FIKGEKLQNYINRKVGGRQIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 



orfl37ng FQPVIIGRHKYVDGGLSQPVPVSAARRQGANFVIAVDISARPSKNVGQGFFSYLDQTLNV 
M M M M I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I h I I :: I I I I I I I I I I I I I 
orf 137-1 FQPVIIGRHTYVDGGLSQPVPVSAARRQGANFVIAVDISARPGKNISQGFFSYLDQTLNV 

orfl37ng MSVSVLQNELGQADWIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPE IKRKLAAYRY 

M M : \ M M M M M M I M M M M M M M M M M M M M M M M M M M M I 
orf 137 MSVSALQNELGQADWIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPE IKRKLAAYRY 

Based on the presence of a predicted prokaryotic membrane lipoprotein lipid attachment site 
(underlined) in the gonococcal protein, it is predicted that the proteins from ^meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 68 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 565>: 

1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGcTG CCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCmAT ATGCGGCAGG CGGGTTTGAA 

201 CCCCGACCCC AAAACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAAGGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAGAA AACCGGAAGA CATAGAAACA 

301 ATGTTCAAAG CGGTACACGG CTGGGAACAT GTGCAGCAGG CTTTGGACAA 

351 ACACGAAGGG CTGCTATTC. . 

This corresponds to the amino acid sequence <SEQ ID 566; ORF138>: 



1 MFRLQFRLFP PLRTAMHILL TALLKCLSLL PLSCLHTLGN RLGHLAFYLL 
51 KEDRARIVAX MRQAGLNPDP KTVKAVFAET AKGGLELAPA FFRKPEDIET 
101 MFKAVHGWEH VQQALDKHEG LLF 
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Further work revealed the complete nucleotide sequence <SEQ ID 567>: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGTTTCGTT 
CATCCTGTTG 
GTCTGCACAC 
AAGGAAGACC 
CCCCGACCCC 
GTTTGGAACT 
ATGTTCAAAG 
ACACGAAGGG 
GCGGACGCTA 
AAACCGCCGA 
TCGCGGCAAA 
TCATCAAAGC 
GTCCCCTCCC 
CAAACCTGCC 
GCGTGAAAAC 
TTCGATTTGC 
CCATGATGCC 
TTCCGACGCA 



TACAATTCAG 
ACCGCCCTGC 
GCTGGGAAAC 
GCGCGCGCAT 
AAAACGGTCA 
TGCCCCCGCG 
CGGTACACGG 
CTGCTATTCA 
CATCAGCCAG 
AAATCAAAGC 
GGAAAAACCG 
CCTGCGTTCG 
CTCAAGAAGG 
TATACCATGA 
CCTGTTTTTC 
ACATCCGCCC 
GCCGTGTTCA 
GTATCTGTTT 



GCTGTTTCCC 
TCAAATGCCT 
CGGCTCGGAC 
CGTCGCCAAT 
AAGCCGTTTT 
TTTTTCAGAA 
CTGGGAACAT 
TCACGCCGCA 
CAGCTTCCGT 
GATAGACAAA 
CGCCTACCAG 
GGCGAAGCAA 
CGGGGAAGGC 
CGCTGGCGGC 
TGCTGCGAAC 
CGTCCAAGGG 
ACCGCAATGC 
ATGTACAACC 



CCTTTGCGAA 
CTCCCTGCTG 
ATCTGGCGTT 
ATGCGGCAGG 
TGCGGAAACG 
AACCGGAAGA 
GTGCAGCAGG 
CATCGGCAGC 
TCCCGCTGAC 
ATCATGCAGG 
CATACAAGGG 
CCATCGTCCT 
GTATGGGTGG 
AAAATTGGCA 
GCCTGCCTGG 
GAATTGAACG 
CGAATATTGG 
GCTACAAAAT 



CCGCCATGCA 
CCGCTTTCCT 
TTACCTTTTA 
CGGGTTTGAA 
GCAAAAGGCG 
CATAGAAACA 
CTTTGGACAA 
TACGATTTGG 
CGCCATGTAC 
CGGGCAGGGT 
GTCAAACAAA 
GCCCGACCAC 
ATTTCTTCGG 
CACGTCAAAG 
CGGACAAGGT 
GCGACAAAGC 
ATACGCCGTT 
GCCGTAA 



This corresponds to the amino acid sequence <SEQ ID 568; ORF138-1: 



1 MFRLQFRLFP PLRTAMH ILL TALLKCLSLL PLSC LHTLGN RLGHLAFYLL 

51 KEDRARIVAN MRQAGLNPDP KTVKAVFAET AKGGLELAPA FFRKPEDIET 

101 MFKAVHGWEH VQQALDKHEG LLFITPHIGS YDLGGRYISQ QLPFPLTAMY 

151 KPPKIKAIDK IMQAGRVRGK GKTAPTSIQG VKQIIKALRS GEATIVLPDH 

201 VPSPQEGGEG VWVDFFGKPA YTMTLAAKLA HVKGVKTLFF CCERLPGGQG 

251 FDLHIRPVQG ELNGDKAHDA AVFNRNAEYW IRRFPTQYLF MYNRYKMP* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF138 shows 99.2% identity over a 123aa overlap with an ORF (ORF138a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

or f 138 . pep MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAX 

I | I I I I I I I I [ I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl38a MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 138 . pep MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 
I I I I I I I I I 1 I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I M I i I I I I I I I I I I I I 
orf 138a MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 

70 80 90 100 110 120 



orf 138. pep LLF 
I I I 

orf 138a LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 

130 140 150 160 170 180 

The complete length ORF138a nucleotide sequence <SEQ ID 569> is: 



1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGCTG CCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCAAT ATGCGTCAGG CAGGCATGAA 

201 TCCCGACCCC AAAACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAAGGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAGAA AACCGGAAGA CATAGAAACA 

301 ATGTTCAAAG CGGTACACGG CTGGGAACAT GTGCAGCAGG CTTTGGACAA 

351 ACACGAAGGG CTGCTATTCA TCACGCCGCA CATCGGCAGC TACGATTTGG 

401 GCGGACGCTA CATCAGCCAG CAGCTTCCGT TCCCGCTGAC CGCCATGTAC 

451 AAACCGCCGA AAATCAAAGC GATAGACAAA ATCATGCAGG CGGGCAGGGT 

501 TCGCGGCAAA GGAAAAACCG CGCCTACCAG CATACAAGGG GTCAAACAAA 

551 TCATCAAAGC CCTGCGTTCG GGCGAAGCAA CCATCGTCCT GCCCGACCAC 
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601 GTCCCCTCCC 

651 CAAACCTGCC 

701 GCGTGAAAAC 

751 TTCGATTTGC 

801 CCATGATGCC 

851 TTCCGACGCA 



CTCAAGAAGG 
TAT AC CAT GA 
CCTGTTTTTC 
ACATCCGCCC 
GCCGTGTTCA 
GTATCTGTTT 



CGGGGAAGGC 
CGCTGGCGGC 
TGCTGCGAAC 
CGTCCAAGGG 
ACCGCAATGC 
ATGTACAACC 



GTATGGGTGG 
AAAATTGGCA 
GCCTGCCTGG 
GAATTGAACG 
CGAATATTGG 
GCTACAAAAT 



ATTTCTTCGG 
CACGTCAAAG 
CGGACAAGGT 
GCGACAAAGC 
ATACGCCGTT 
GCCGTAA 



This encodes a protein having amino acid sequence <SEQ ID 570>: 



10 



1 MFRLQFRLFP 

51 KEDRARIVAN 

101 MFKAVHGWEH 

151 KPPKIKAIDK 

201 VPSPQEGGEG 

251 FDLHIRPVQG 



PLRTAMHILL TALLKCLSLL PLSCLHTLGN 



MRQAGLNPDP 
VQQALDKHEG 
IMQAGRVRGK 
VWVDFFGKPA 
ELNGDKAHDA 



KTVKAVFAET 
LLFITPHIGS 
GKTAPTSIQG 
YTMTLAAKLA 
AVFNRNAEYW 



AKGGLELAPA 
YDLGGRYISQ 
VKQIIKALRS 
HVKGVKTLFF 
IRRFPTQYLF 



RLGHLAFYLL 
FFRKPEDIET 
QLPFPLTAMY 
GEATIVLPDH 
CCERLPGGQG 
MYNRYKMP* 



15 



20 



25 



30 



ORF138a and ORF138-1 show 99.7% identity over a 298aa overlap: 

orf 138a. pep MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I i I I I I I I I I I I I I I I I I I 
orf 138-1 MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 

orf 138a . pep MRQAGMNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 
I I I I I : I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 
orf 138-1 MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 

orf 138a. pep LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I 1 I I I I I I I I I I I I I I 
orf 138-1 LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 

orf 138a. pep VKQ I I KALRS GEATIVLPDH VP SPQEGGEGVWVDFFGKPAYTMTLAAKLAHVKGVKTLFF 
I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I M I I 
orf 138-1 VKQIIKALRSGEATIVLPDHVPSPQEGGEGVWVDFFGKPAYTMTLAAKLAHVKGVKTLFF 

or:: 138a . pep CCERLPGGQGFDLHIRPVQGELNGDKAHDAAVFNRNAEYWIRRFPTQYLFMYNRYKMP 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I 
orf 138-1 CCERLPGGQGFDLHIRPVQGELNGDKAHDAAVFNRNAEYWIRRFPTQYLFMYNRYKMP 



35 Homology with a predicted ORF from ^gonorrhoeae 

ORF138 shows 94.3% identity over a 123aa overlap with a predicted ORF (ORF138ng) from 



40 



45 



N. gonorrhoeae: 

orf 138 .pep 
orfl38ng 
orf 138. pep 
orfl38ng 



MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIV/OC 
I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I 
MFRLQFRLFPPLRTAMHILLTALLKCLSLLSLSCLHTLGNRLGHLAFYLLKEDRARIVAN 



60 



60 



120 



MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 
I 1 I I I I I I I : I I I I I 1 I I I I I II I I I I 1 I I : I I II I I I I I I I I I II I I I I II I II II 
MRQAGLNPDTQTVKAVFAETAKCGLELAPAFFKKPEDIETMFECAVHGWEHVQQALDKGEG 120 



LLF 
I I I 



orf 138. pep 

orfl38ng LLFITPHIGSYDLGGRYISQQLPFHLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTGIQG 

The complete length ORF138ng nucleotide sequence <SEQ ID 571> is: 



123 



180 



50 



55 



60 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 



ATGTTTCGTT 
CATCCTGTTG 
GTCTGCACAC 
AAGGAAGACC 
CCCCGACACG 
GTTTGGAACT 
ATGTTCAAAG 
GGGCGAAGGG 
GCGGACGCTA 
AAGCCGCCGA 
GCGCGGCAAA 
tcatcaAGGC 



TACAATTCAG 
ACCGCCCTGC 
GCTGGGAAAC 
GCGCGCGCAT 
CAGACGGTCA 
TGCCCCCGCG 
CGGTACACGG 
CTGCTGTTCA 
CATCAGCCAG 
AAATCAAAGC 
GGCAAAACcg 
CCTGCGCGCG 



GCTGTTTCCC 
TCAAATGCCT 
CGGCTCGGAC 
CGTCGCCAAT 
AAGCCGTTTT 
TTTTTCAAAA 
CTGGGAACAC 
TCACGCCGCA 
CAGCTTCCGT 
GATAGACAAA 
cgcccaccgg 
GGCGAGGCAA 



CCTTTGCGAA 
CTCCCTGCTG 
ATCTGGCGTT 
ATGCGGCAGG 
TGCGGAAACG 
AACCGGAAGA 
GTGCAGCAGG 
CATCGGCAGC 
TCCACCTGAC 
ATCATGCAGG 
catACAAGGG 
CCAtcATCCT 



CCGCCATGCA 
TCGCTTTCCT 
TTACCTTTTA 
CGGGTTTGAA 
GCAAAATGCG 
CATCGAAACA 
CTTTGGACAA 
TACGATTTGG 
CGCCATGTAC 
CGGGCAGGGT 
GTCAAACAAA 
GCCCGACCAC 
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601 GTCCCTTCTC CGCAGGAagg cggCGGCGTG TGGGCGGATT TTTTCGGCAA 

651 ACCTGCATAc acCATGACAC TGGCGGCAAA ATTGGCACAC GTCAAAGGCG 

701 TGAAAACCCT GTTTTTCTGC TGCGAACGCC TGCCCGACGG ACAAGGCTTC 

751 GTGTTGCACA TCCGCCCCGT CCAAGGGGAA TTGAACGGCA ACAAAGCCCA 

5 801 CGATGCCGCC GTGTTCAACC GCAATACCGA ATATTGGATA CGCCGTTTTC 

851 CGACGCAGTA TCTGTTTATG TACAACCGCT ATAAAACGCC GTAA 

This encodes a protein having amino acid sequence <SEQ ID 572>: 

1 MFRLQFRLFP PLRTAM HILL TALLKCLSLL SLSC LHTLGN RLGHLAFYLL 

51 KEDRARIVAN MRQAGLNPDT QTVKAVFAET AKCGLELAPA FFKKPEDIET 

10 101 MFKAVHGWEH VQQALDKGEG LLFITPHIGS YDLGGRYISQ QLPFHLTAMY 

151 KPPKIKAIDK IMQAGRVRGK GKTAPTGIQG VKQIIKALRA GEATIILPDH 

201 VPSPQEGGGV WADFFGKPAY TMTLAAKLAH VKGVKTLFFC CERLPDGQGF 

251 VLHIRPVQGE LNGNKAHDAA VFNRNTEYWI RRFPTQYLFM YNRYKTP* 

ORF138ng and ORF138-1 show 94.3% identity over 299aa overlap: 

1 5 orf 138-1 . pep MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I 
orfl38ng MFRLQFRLFPPLRTAMHILLTALLKCLSLLSLSCLHTLGNRLGHLAFYLLKEDRARIVAN 

orf 138-1 . pep MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 
20 | | | | | | | | | : | | | I I I | I | I I | I | | I | I | I : I I I I | I I I I I I I || | | | | | | | | | | | | 

orfl38ng MRQAGLNPDTQTVKAVFAETAKCGLELAPAFFKKPED I ETMFKAVHGWEH VQQALDKGEG 

orf 138-1. pep LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 
I I I I I I I II I I I I I I I I I I I I I I I i I I I I I I I I I I I I I M I I I I I I I I 1 I I I I I I : I I I 
25 orfl38ng LLFITPHIGSYDLGGRYISQQLPFHLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTGIQG 



30 



orf 138-1 . pep VKQIIKALRSGEATIVLPDHVPSPQEGGEGVWVDFFGKPAYTMTLAAKLAHVKGVKTLFF 
I I I I I I I I I : I I I I I : I I I I I I I I I I I I I I I : I II I I I I II I I I I II I I I I I I I I I I I I 
orfl38ng VKQIIKALRAGEATIILPDHVPSPQEGG-GVWADFFGKPAYTMTLAAKLAHVKGVKTLFF 

orf 138-1 . pep CCERLPGGQGFDLH I RPVQGELNGDKAHDAAVFNRNAEYW I RRFPTQYLFM YNRYKMP 
I I I I I I I I I I I I I I I I I I 1 I II : I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I 
orfl38ng CCERLPDGQG FVLHI RPVQGELNGNKAH DAAV FNRNTE YW I RRFPTQYLFMYNRYKT P 

In addition, ORF138ng is homologous to htrB protein from Pseudomonas fluorescens: 

35 gnl|PID|e334283 (Y14568) htrB [Pseudomonas fluorescens] Length « 253 

Score = 80.8 bits (196), Expect = 9e-15 

Identities - 49/151 (32%), Positives = 79/151 (51%), Gaps = 6/151 (3%) 

Query: 101 MFKAVHGWEHVQQALDKGEGLLFITPHIGSYD-LGGRYISQQLPFHLTAMYKPPKIKAID 159 
40 + + V G E +++AL G+G++ IT H+G+++ L Y SQ P Y+PPK+KA+D 

Sbjct: 94 LVREVEGLEVLKEALASGKGWGITSHLGNWEVLNHFYCSQCKPI 1 FYR P PKLKAV D 150 

Query: 160 KIMQAGRVRGKGKTAPTGIQGVKQIIKALRAGEATIILPDHVPSPQEGGGVWADFFGKPA 219 
++++ RV+ K A + +G+ +IK +R G I D P P E G++ FF A 
45 Sbjct: 151 ELLRKQRVQLGNKVAASTKEGILSVIKEVRKGGQVGIPAD — PEPAESAGIFVPFFATQA 208 

Qu^ry: 220 YTMTLAAKLAHVKGVKTLFFCCERLPDGQGF 250 

T + +F RLPDG G+ 

Sbjct: 209 LTSKFVPNMLAGGKAVGVFLHALRLPDGSGY 239 

50 Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it was predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF138-1 (57kDa) was cloned in the pGex vectors and expressed in E.coli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 14A 
55 shows the results of affinity purification of the GST-fusion protein. Purified GST- fusion protein 
was used to immunise mice, whose sera were used for ELISA (positive result) and FACS analysis 
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(Figure 14B). These experiments confirm that ORF138-1 is a surface-exposed protein, and that it 
is a useful immunogen. 



Example 69 

The following partial DNA sequence was identified mN. meningitidis <SEQ ID 573>: 

1 . . GCGTGGTCGG CCGGCGAATC GTGGCGTGTG TTAATGGAAA GTGAAACGTG 

51 GCATGCGGTG TGGAATACTT TGCGCTTCTC GGCGGCGGCG GTGTATGCGG 

101 CAGCGGTTTT GGGTGTGGTG TATGCGGCGC CGGCGCGGCG GTCGGCGTGG 

151 ATGCGCGGGC TGATGTTTTA GCCGTTTATG GTGTCGCCGG TTTGTGTTTC 

201 GGCGGGCGTG CTGCTGCTTT ATCCGCAGTG GACGGCTTCG TTGCCGTTGC 

251 TGCTGGCGAT GTATGCGCTG CTGGCGTATC CGTTTGTGGC AAAAGATGTT 

301 TTATCAGCCT GGGATGCACT GCCGCCGGAT TACGGCAGGG CGGCGGCGGG 

351 TTTGGGTGCA AACGGCTTTC AGACGGCATG CCGCATCACG TTCCCCCTCT 

401 TGAAACCGGC GTTGCGGCGC GGTCTGACTT TGGCGGCGGC AACCTGCGTG 

451 GGCGAATTTG CGGCGACATT GTTTCTGTCG CGTCCGGAAT GGCAGACGCT 

501 GACGACTTTG ATTTATGCCT ATTTGGGACG CGCGGGTGAG GATAATTACG 

551 CGCGGGCGAT GGTGCTG. . 

This corresponds to the amino acid sequence <SEQ ID 574; ORF139>: 



1 . .AF/SAGESWRV LMESETWHAV WNTLRFSAAA VYAAAVLGW YAAPARRSAW 

51 MRGLMFXPFM VSPVCVSAGV LLLYPQWTAS LPLLLAMYAL LAYPFVAKDV 

101 LSAWDALPPD YGRAAAGLGA NGFQTACRIT FPLLKPALRR GLTLAAATCV 

151 GEFAATLFLS RPEWQTLTTL IYAYLGRAGE DNYARAMVI*. . 

Further work revealed the complete nucleotide sequence <SEQ ID 575>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 



ATGGATGGAC 
GGCTTTTTTG 
ATGACGGTTT 
CGTTTGGCGT 
GCCTTTGGGC 
GGCGGGCTTT 
TTGGTGGCGG 
GTGGCGCGGC 
TTTTCAACCT 
GTGCCTGCGG 
GCGGCGGTTT 
GCGGCGTGTG 
CTGCTGCTGG 
GTTGGTCATG 
TGGTGTTGGG 
AGGCGCGCGG 
GCAGTCGGTC 
CTGTGTGCTG 
GCCGGCGAAT 
GTGGAATACT 
TGGGTGTGGT 
CTGATGTTTT 
GCTGCTGCTT 
TGTATGCGCT 
TGGGATGCAC 
AAACGGCTTT 
CGTTGCGGCG 
GCGGCGACAT 
GATTTATGCC 
TGGTGCTGAC 
TTGGACGGCG 



GGCGTTGGGT 
GCGGTAATGG 
GGCGTGGCGC 
GGACGGTATT 
GTGCCTGTCG 
GGTGCTGCGC 
GCGTGGGCGT 
AGGCAGGATA 
TCCTGTGTTG 
CACGGCTTCA 
TGGGACATTG 
CCTTGTCTTT 
GCGGCAGCCG 
TTCGAACTCG 
GGTAACGGCG 
TTTCGGATAA 
GGGGAATATG 
CCTGTTTCCT 
CGTGGCGTGT 
TTGCGCTTCT 
GTATGCGGCG 
TGCCGTTTAT 
TATCCGCAGT 
GCTGGCGTAT 
TGCCGCCGGA 
CAGACGGCAT 
CGGTCTGACT 
TGTTTCTGTC 
TATTTGGGAC 
ATTGCTGTTG 
GCGAAGGCGG 



GGTATGGGGT 
TCGTTGCGCC 
GCGGTGCTGT 
TCAGGCAGCG 
CGTGGGTGCT 
CTGCTGATGC 
GCTGGCCCTG 
CGCCGTATCT 
GTCAGGGCGG 
GACGGCACGG 
AAATGCCCGT 
CTGTATTGTT 
TTATGCCACG 
ATATGGCGGT 
GCGGCAGGGT 
GGCGGTTTCC 
TGCTGCTGGC 
TTGTTGGCAA 
GTTAATGGAA 
CGGCGGCGGC 
GCGGCGCGGC 
GGTGTCGCCG 
GGACGGCTTC 
CCGTTTGTGG 
TTACGGCAGG 
GCCGCATCAC 
TTGGCGGCGG 
GCGTCCGGAA 
GCGCGGGTGA 
GCGGCGTTCG 
AAAACAGACG 



GCTTTTGCCC 
TTTGTGGGCG 
CGGATGCCTA 
GCAACCTGTG 
GGCGCGGCTG 
TGCCTTTTGT 
TTCGGGGCGG 
GTTGTTGTAC 
CGTATCAGGG 
ACGTTGGGCG 
TTTGCGCCCG 
TTTCCGGGTT 
GTCGAAGTGG 
TGCTTCGGTG 
TGCTGTATGC 
CCTGTGATGC 
GTTTGCGGCG 
TTGTTGTGAA 
AGTGAAACGT 
GGTGTATGCG 
GGTCGGCGTG 
GTTTGTGTTT 
GTTGCCGTTG 
CAAAAGATGT 
GCGGCGGCGG 
GTTCCCCCTC 
CAACCTGCGT 
TGGCAGACGC 
GGATAATTAC 
CGCTGGGTAT 
GAAACGTTAT 



TGCTGCCTTC 
GTGGCGGCGT 
TATGCTCAAA 
TGCTGGTGCT 
GCGTTTCCGG 
GATGCCCACG 
ACGGGCTGTT 
GGCAATGTGT 
GTTTGTGCAA 
CGGGGGCGTG 
TGGCTTGCCG 
CGGGCTGGCG 
AAATTTACCA 
CTGGTGTGGC 
GTGGTTCGGC 
CGTCGCCGCC 
GCGGTGTTGT 
AGCGTGGTCG 
GGCAGGCGGT 
GCGGCGGTTT 
GATGCGCGGG 
CGGCGGGCGT 
CTGCTGGCGA 
TTTATCAGCC 
GTTTGGGTGC 
TTGAAACCGG 
GGGCGAATTT 
TGACGACTTT 
GCGCGGGCGA 
TTTCCTGCTG 
AA 



This corresponds to the amino acid sequence <SEQ ID 576; ORF139-l>: 



1 MDGRRWWWG AFALLPSAFL AVMWAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWV LARL AFPGRALVLR LLML PFVMPT 

101 LVAGVGVLAL FGA DGLLWRG RQDTPYLLLY GNVFFNLPVL VRAAYQGFVQ 

151 VPAARLQTAR TLGAGAWRRF WDIEMPVLRP WLAGG VCLVF LYCFSGFGLA 
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201 LLLGGSRYAT VEVEIYQLVM FELDMAV ASV LVWLVLGVTA AAGLL YAWFG 

251 RRAVSDKAVS PVMPSPPQSV GEYVLLAFA A AVLSVCCLFP LLAIW KAffS 

301 AGESWRVLME SETWQAVWNT LRFS AAAVYA AAVLGWYAA AA RRSAWMRG 

351 LMF LPFMVSP VCVSAGVLLL YPQWTAS LPL LLAMYALLAY PFVA KDVLSA 

401 WDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 

451 AATLFLSRPE WQTLTTLIYA YLGRAGEDNY ARA MVLTLLL AAFALGIFLL 

501 LDGGEGGKQT ETL* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF139 shows 94.7% identity over a 189aa overlap with an ORF (ORF139a) from strain A ofN. 
meningitidis: 



10 20 30 

orf 139 .pep AW SAGE SWRVLME SET WHAVWNT LRFS AAA 

I I I I I I I I I I I I I I I I I : I I I I I I I I I I I 
orf 13 9a QSVGEYVLLAF AAAVXSVCCLFXLLAIW KAWSAGESWRVLMESETWQAVWNTXRFSAAA 
270 280 290 300 310 320 



40 50 60 70 80 90 

orf 139 . pep VYAAAVLG W YAAP ARRS AWMRGLM FXP FMV S PVCVS AGVL LL Y PQWTAS L PL L LAM Y AL 

I I I I I I I I I I I I I I I I I I I I i I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
or f 1 3 9a VYAAAVLGWYAAA ARRSAWMRGLM FLPFMVS PVCVS AGVLLL XPQWT AS LPLLLAMYAL 

330 340 350 360 370 380 



100 110 120 130 140 150 

orf 13 9. pep LAYPFVA KDVLSAWDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 

I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 139a LAYPFVA KDVLSAXDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 

390 400 410 420 430 440 



160 170 180 189 

orf 139. pep GEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNYARAMVL 

I I I I I I I I II I I I I I I 1 1 I I I I MM I I II I M I I 
orf 13 9a GEFAATLFXSRXEWQTLTTLIYAYXGRAGXDNYARA MVLTLLLAAFALGXFLLL DGGEGG 

450 460 470 480 490 500 

The complete length ORF 139a nucleotide sequence <SEQ ID 577> is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701. 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



ATGGATGGAC 
GGCTTTTTTG 
ATGACGGTTT 
CGTTTGGCGT 
GCCTTTGGGC 
GGCGGGCTTT 
TTGGTGGCGG 
GTGGCGCGGC 
TTTTTNACCT 
GTGCCTGCGG 
GCGGCGGTTT 
GCGGCGTGTG 
TTGCTGCTGG 
GTTGGTCATG 
TGGTGTNGGG 
AGGCGCGCGG 
GCAGTCGGTC 
CTGTGTGCTG 
GCCGGCGAAT 
GTGGAATACT 
TGGGTGTGGT 
CTGATGTTTT 
GCTGCTGCTT 
TGTATGCGCT 
TGNGATGCAC 
AAACGGCTTT 
CGTTGCGGCG 
GCGGCAACCT 



GGCGTTGGGC 
GCGGCAATGG 
GGCGTGGCGC 
GGACGGTATT 
GTGCCTGTCG 
GGTGCTGCGC 
GCGTGGGCGT 
TGGCAGGATA 
TCCTGTGTTG 
CACGGCTTCA 
TGGGACATTG 
CCTTGTCTTC 
GCGGCAGCCG 
TTCGAACTCG 
GGTAACNGCG 
TTTCGGATAA 
GGGGAATATG 
CCTGTTTCNT 
CGTGGCGTGT 
NTGCGCTTCT 
GTATGCGGCG 
TGCCGTTTAT 
NATCCGCAGT 
GCTGGCGTAT 
TGCCGCCGGA 
CAGACGGCAT 
CGGTCTGACT 
TGTTCNTGTC 



GGTATGGGGT 
TCGTTGCGCC 
GCGGTGCTGT 
TCAGGCAGCG 
CGTGGGTGCT 
CTGCTGATGC 
GCTGGCTCTG 
CGCCGTATCT 
GTCAGGGCGG 
GACGGCACNG 
AAATGCCCGT 
CTGTATTGTT 
TTATGCCACG 
ATATGGCGGT 
GCGGCAGGGT 
GGCNGTTTCC 
TGCTNCTGGC 
TTGTTGGCAA 
GTTAATGGAA 
CGGCGGCGGC 
GCGGCGCGGC 
GGTGTCGCCG 
GGACGGCTTC 
CCGTTTGTGG 
TTACGGCAGG 
GCCGCATCAC 
TTGGCGGCGG 
GCGTCNCGAG 



GCTTTTGCCC 
TTTGTGGGCG' 
CGGATGCCTA 
GCAACCTGTG 
GGCGCGGCTG 
TGCCTTTTGT 
TTCGGGGCGG 
GTTGTTGTAC 
CATATCAGGG 
ACATTGGGCG 
TTTGCGCCCG 
TTTCGGGGTT 
GTCGAAGTGG 
TGCTTCGGTG 
TGCTGTATGC 
CCTGTGATGC 
GTTTGCGGCG 
TTGTTGTGAA 
AGTGAAACGT 
GGTGTATGCG 
GGTCGGCGTG 
GTTTGTGTTT 
GTTGCCGCTG 
CAAAAGATGT 
GCGGCGGCGG 
GTTCCCCCTC 
CAACCTGCGT 
TGGCAGACGC 



TGCTGCCTTC 
GTGGCGGCGT 
TATGCTCAAA 
TGCTGGTGCT 
GCGTTTCCGG 
GATGCCCACG 
ACGGCCTGTN 
GGCAATGTGT 
GTTTGTGCAA 
CGGGGGCGTG 
TGGCTTGCCG 
CGGGCTGGCA 
AAATTTACCA 
CTNGTGTGGC 
GTGGTTCGGC 
CGTCGCCGCC 
GCGGTGTNGT 
AGCGTGGTCG 
GGCAGGCGGT 
GCGGCGGTTT 
GATGCGCGGG 
CGGCGGGCGT 
CTGCTGGCGA 
TTTATCAGCC 
GTTTGGGTGC 
TTGAAACCGG 
GGGCGAATTT 
TGACGACTTT 
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1401 GATTTATGCC TATNTGGGAC GCGCGGGTGA NGATAATTAC GCGCGGGCGA 
1451 TGGTGCTGAC ATTGCTGTTG GCGGCGTTCG CGCTGGGTAT NTTCCTGCTG 
1501 TTGGACGGCG GCGAAGGCGG AAAACGGACG GAAACGTTAT AA 

This encodes a protein having amino acid sequence <SEQ ID 578>: 



10 



15 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



MDGRRWAVWG AFALLPSAFL AAMWAPLWA VAAYDGLAWR AVLSDAYMLK 



RLAWTVFQAA 
LVAGVGVLAL 
VPAARLQTAX 
LLLGGSRYAT 
RRAVSDKAVS 
AGESWRVLME 
LMFLPFMVSP 



ATCVLVLPLG VPVAWV LARL AFPGRALVLR LLML PFVMPT 
FGA DGLXWRG WQDTPYLLLY GNVFFXLPVL VRAAYQGFVQ 
WDIEMPVLRP WLAGGVCLVF LYCFSGFGLA 



TLGAGAWRRF 
VEVEIYQLVM 
PVMPSPPQSV 
SETWQAVWNT 
VCVSAGVLLL 



XDALPPDYGR 
AATLFXSRXE 
LDGGEGGKRT 



AAAGLGANGF 
WQTLTTLIYA 
ETL* 



FELDMAVA SV LVWLVXGVTA AAGLL YAWFG 
GEYVLLAFA A AVXSVCCLFX LLAIW KAWS 
XRFS AAAVYA AAVLGVVYAA AA RRSAWMRG 
XPQWTAS LPL L LAM Y ALLAY PFVA KDVLSA 
QTACRITF?L LKPALRRGLT LAAATCVGEF 
YXGRAGXDNY ARAMVLTLLL AAFALGXFLL 



ORF139a and ORF139-1 show 96.5% homology over a 514aa overlap: 



20 



25 



30 



35 



40 



45 



50 



55 



orf 13 9a. pep MDGRRWAVWGAFALLPSAFLAAMVVAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 
I II I I I: I I II I I I I II I I 11:1 I I M I I I I I I I II II i I I I I 1 I I 1 I I I I I I Ml I I I I 
or f 1 3 9- 1 M DGRRW WWGAFALL P S AFLAVMWAPLWAVAAY DG LAWRAVL S DAYMLKRLAWT V FQAA 

or f 1 3 9a . pep ATCVLVLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLXWRG 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I III 
or f 1 3 9- 1 ATCVLVLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLLWRG 

or f 13 9a. pep WQDTPYLLLYGNVFFXLPVLVRAAYQGFVQVPAARLQTAXTLGAGAWRRFWDIEMPVLRP 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 
orf 139-1 RQDTPYLLLYGNVFFNLPVLVRAAYQGFVQVPAARLQTARTLGAGAWRRFWDIEMPVLRP 

orf 139a. pep WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAVASVLVWLVXGVTA 
I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I ! I 1 I I 
or f 1 3 9- 1 WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEI YQLVMFELDMAVASVLVWLVLGVTA 

orf 13 9a. pep AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFAAAVXSVCCLFXLLAIWKAWS 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I II I I I I I I I I I I II I I II I I 
orf 139-1 AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFAAAVLSVCCLFPLLAIWKAWS 

orf 139a . pep AGESWRVLMESETWQAVWNTXRFSAAAVYAAAVLGWYAAAARRSAWMRGLMFLPFMVSP 
I I I I I I I II I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I 
or f 1 3 9- 1 AGESWRVLMESETWQAVWNTLRFSAAAVYAAAVLGWYAAAARRSAWMRGLMFLPFMVS P 

orf 139a . pep VCVSAGVLLLXPQWTASLPLLLAMYALLAYPFVAKDVLSAXDALPPDYGRAAAGLGANGF 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
orf 139-1 VCVSAGVLLLYPQWTASLPLLLAMYALLAYPFVAKDVLSAWDALPPDYGRAAAGLGANGF 

orf 13 9a . pep QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFXSRXEWQTLTTLI YAYXGRAGXDNY 
I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II II I I I I I II II I I I I I Ml 
orf 139-1 QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNY 

orf 139a . pep ARAMVLTLLLAAFALGXFLLLDGGEGGKRTETLX 
I I I I I I I I I I I I I I I I 1 I I I I I I I I I I : I I I I I 
or f 1 3 9- 1 ARAMVLTLLLAAFALGI FLLLDGGEGGKQTETLX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF139 shows 95.2% identity over a 189aa overlap with a predicted ORF (ORF139ng) from 
N. gonorrhoeae: 



60 



orf 139. pep AWSAGESWRVLMESETWHAVWNTLRFSAAA 30 

I I I II II I II I I I I II : I I I I I I II I I II 

orfl39ng QSVGEYVLLAFSVAVLSVCCLFPLSAIWKAWSAGESRRVLMESETWQAVWNTLRFSAAA 327 

or f 1 3 9 . pep VYAAAVLGWYAAPARRSAWMRGLMFXPFMVSPVCVSAGVLLLYPQWTASLPLLLAMYAL 9 0 

I : | I I | I I I I I I 1 I I I : I I I I I : I I I 11 I I I 1 I I I I I 11 I I I I I I I I I I I I I I I I I 

or f 1 3 9ng VFAAAVLGWYAAAARRLVWMRGLVFLPFMVS PVCVS AGVLLLYPGWTASLPLLLAMYAL 387 
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orfl39.pep I^YPFVAKDVLSAWDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 150 

I I I I I I I I I I j I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I 
orfl39ng LAYPEVAKDVLSAWDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 4 47 

orf 139 . pep GEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNYARAMVL 189 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl39ng GEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNYARAMVLTLLLSAFAVCIFLLLDNGEGG 507 

The complete length ORF139ng nucleotide sequence <SEQ ID 579> is predicted to encode a 
protein having amino acid sequence <SEQ ID 580>: 

1 MDGRCWAVRG AFSLLPSAFL AVMWAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWVLARL AFPGRALVLR LLMLPFVMPT 

101 LVAGVGVLAL FGADGLLWRG RQDTPYLLLY GNVFFNLPVL VRAAYQGFAQ 

151 VPAARLQTAR TLGAGAWRPF WDIEMP VLRP WLAGGVCLVF LYCFSGFGLA 

201 LLLGGSRYAT VEVEIYQLVM FELDMAGASA LVWLVLGVTA AAGLLYAWFG 

251 RRAVSDKAVS PVMPSPPQSV GEYVLLAFSV AVLSVCCLFP LSAIWKAWS 

301 AGESRRVLME SETWQAVWNT LRFSAAAVFA AAVLGVVYAA AARRLVWMRG 

351 LVFLPFMVSP VCVSAGVLLL YPGWTASLPL LLAMYALLAY PFVAKDVLSA 

401 WDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 

451 AATLFLSRPE WQTLTTLIYA YLGRAGEDNY ARAMVLTLLL SAFAVCIFLL 

501 LDNGEGGKRT ETL* 

Further work revealed a variant gonococcal DNA sequence <SEQ ED 581>: 

1 ATGGATGGAC GGTGTTGGGC GGTACGGGGT GCTTTTTCCC TGCTGCCTTC 

51 GGCTTTTTTG GCGGTAATGG TCGTTGCGCC TTTGTGGGCG GTGGCGGCGT 

101 ATGACGGTTT GGCGTGGCGC GCGGTGCTGT CGGATGCCTA TATGCTCAAA 

151 CGTTTGGCGT GGACGGTGTT TCAGGCGGCG GCAACCTGTG TGCTGGTGCT 

201 GCCTTTGGGC GTGCCTGTCG CGTGGGTGCT GGCGCGGCTG GCGTTCCCGG 

251 GGCGGGCTTT GGTGCTGCGC CTGCTGATGC TGCCGTTTGT GATGCCCACG 

301 CTGGTGGCGG GCGTGGGCGT GCTGGCTCTG TTCGGGGCGG ACGGGCTGTT 

351 GTGGCGCGGC CGGCAGGATA CGCCGTATCT GTTGTTGTAC GGCAATGTGT 

401 TTTTCAACCT GCCCGTGTTG GTCAGGGCGG CGTATCAGGG GTTTGCTCAA 

451 GTGCCTGCGG CACGGCTTCA GACGGCACGG ACGTTGGGCG CGGGGGCGTG 

501 GCGGCGGTTT TGGGACATTG AAATGCCCGT TTTGCGCCCG TGGCTTGCCG 

551 GCGGCGTGTG CCTTGTCTTC CTGTATTGTT TTTCGGGGTT CGGGCTGGCA 

601 TTGCTGTTGG GCGGCAGCCG TTATGCCACG GTCGAAGTGG AAATTTACCA 

651 GTTGGTTATG TTCGAACTCG ATATGGCGGG GGCTTCGGCG CTGGTGTGGC 

701 TGGTGTTGGG GGTAACGGCG GCGGCAGGGT TGCTGTATGC GTGGTTCGGC 

751 AGGCGCGCGG TTTCGGATAA GGCGGTTTCC CCCGTGATGC CGTCGCCGCC 

801 GCAATCGGTG GGGGAATATG TATTGCTGGC ATTTTCGGTG GCGGTGTTGT 

851 CCGTGTGCTG CCTGTTTCCT TTGTCGGCAA TTGTTGTGAA AGCGTGGTCG 

901 GCCGGCGAAT CGCGGCGTGT GTTAATGGAA AGTGAAACGT GGCAGGCAGT 

951 GTGGAATACt ttGCGCTTTT CGGCGGCGGC GGTGTTTGCG GCGGCGGTTT 

1001 TGGGTGTGGT GTATGCGGCG GCGGCGCGGC GGCTGGTGTG GATGCGCGGA 

1051 CTGGTGTTTT TACCGTTTAT GGTGTCGCCG GTTTGTGTTT CGGCGGGCGT 

1101 GCTGCTGCTT TATCCGGGGT GGACGGCTTC GTTACCGCTG CTGCTGGCGA 

1151 TGTATGCGCT GCTGGCGTAT CCGTTTGTGG CAAAAGATGT TTTATCGGCC 

1201 TGGGATGCAC TGCCGCCGGA TTACGGCAGG GCGGCGGCAG GTTTGGGCGC 

1251 AAACGGCTTT CAGACGGCAT GCCGTATCAC GTTCCCCCTC TTGAAACCGG 

1301 CGTTGCGGCG CGGTCTGACT TTGGCGGCGG CGACGTGTGT GGGCGAATTT 

1351 GCGGCAACCT TGTTCCTGTC GCGTCCGGAA TGGCAGACGT TGACGACTTT 

1401 GATTTATGCC TATTTGGGGC GTGCGGGTGA GGACAATTAT GCGCGGGCAA 

1451 TGGTGTTGAC ATTGCTGTTG TCGGCATTTG CGGTGTGCAT TTTCCTGCTG 

1501 TTGGACAACG GCGAAGGCGg aaaACGGACG GAAACGTTAT AA 

This corresponds to the amino acid sequence <SEQ ID 582; ORF139ng-l>: 

1 MDGRCWAVRG AFSLLPSAFL AVMWAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWVL ARL AFPGRALVLR LLMLP FVMPT 

101 LVAGVGVLAL FG ADGLLWRG RQDTPYLLLY GNVFFNLPVL VRAAYQGFAQ 

151 VPAARLQTAR TLGAGAWRPF WDIEMPVLRP WLAGG VCLVF LYCFSGFGLA 

201 LLLGGSRYAT VEVEIYQLVM FELDMAG ASA LVWLVLGVTA AAGLL YAWFG 

251 RRAVSDKAVS PVMPSPPQSV GEYVLLAFS V AVLSVCCLFP LSAIW KAWS 

301 AGESRRVLME SETWQAVWNT LRFS AAAVFA AAVLGWYAA A ARRLVWMRG 

351 LV FLPFMVSP VCVSAGVLLL YPGWTASL PL LLAMYALLAY PFVAK DVLSA 

401 WDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 

451 AATLFLSRPE WQTLTTLIYA YLGRAGEDNY ARA MVLTLLL SAFAVCIFLL 

501 LDNGEGGKRT ETL* 
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ORF139ng-l and ORF139-1 show 95.9% identity over 513aa overlap: 

orfl39ng MDGRCWAVRGAFSLLPSAFLAVMWAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 
I I I I 1:1 I I I : I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I II I I I 
orf 139-1 MDGRRWVWGAFALLPSAFLAVMWAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 

orfl39ng ATCVLVLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLLWRG 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 139-1 ATCVLVLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLLWRG 

orfl39ng RQDTPYLLLYGNVFFNLPVLVRAAYQGFAQVPAARLQTARTLGAGAWRRFWDIEMPVLRP 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I 
orf 139-1 RQDTPYLLLYGNVFFNLPVLVRAAYQGFVQVPAARLQTARTLGAGAWRRFWDIEMPVLRP 

orfl39ng WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAGASALVWLVLGVTA 
I t I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I 
orf 139-1 WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAVASVLVWLVLGVTA 

orfl39ng AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFSVAVLSVCCLFPLSAIWKAWS 
1 I J I I I I i I I J I I I I I I I 1 I I 1 I r I J I I I I I I E J f I f r 1 | j I | | | J j | I MINIM 
orfl39-l AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFAAAVLSVCCLFPLLAIVVKAWS 

o r f 1 3 9ng AGESRRVLME SETWQAVWNTLRFS AAAVFAAAVLG WYAAAARRLVWMRGLVFLP FMVS P 

MM I I M I I II I I M I I I II M I M I: M 11 I I I II I I I I M MMMMIIIMM 
orf 139 AGE SWRVLMESETWQAVWNTLRFSAAAVYAAAVLGWYAAAARRSAWMRGLMFLP FMVS P 

orfl39ng VCVSAGVLLLYPGWTASLPLLLAMYALLAYPFVAKDVLSAWDALPPDYGRAAAGLGANGF 
I II I M I I I I I I II I II I I II I II I I II i I I II I I I I II II II II I II II I I I I I II I I 
orf 139-1 VCVSAGVLLLYPQWTASLPLLLAMYALLAYPFVAKDVLSAWDALPPDYGRAAAGLGANGF 

orfl39ng QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNY 
I I II II I I II I I 1 II I I I I II I I II I II I I I II I I M I I I I II II I II I II I I I I I II II 
orf 139-1 QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNY 

orfl39ng ARAMVLTLLLSAFAVCIFLLLDNGEGGKRTETL 
MIMIMI 1:111: I M I I : I I M I : I ! I 
orf 139-1 ARAMVLTLLLAAFALGIFLLLDGGEGGKQTETL 

Based on the presence of a predicted binding-protein-dependent transport systems inner membrane 
component signature (underlined) in the gonococcal protein, it is predicted that the proteins from 
N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 



Example 70 



The following partial DNA sequence was identified in N.meningitidis <SEQ ID 583>: 

1 ATGGACGGCT GGACACAGAC GCTGTCCGCG CAAACCCTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAGA TTCCGCATCC 

101 ACGCGCTGCT GACACTGGTC ATCGTCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT TGTCAAAGAC ATACTGGTCA AAAACTTCGG 

201 CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGCCTGGGC GCGATGCTCG 

251 AACGTTTGGT C. . . 

This corresponds to the amino acid sequence <SEQ ED 584; 0RF14O: 

1 MDGWTQTLSA QTLLGISAAA IILILILIVR FRIHALLTLV IVSLLTALAT 
51 GLPTGSIVKD ILVKNFGGTL GGVALLVGLG AMLERLV. . 

Further work revealed the complete nucleotide sequence <SEQ ID 585>: 

1 ATGGACGGCT GGACACAGAC GCTGTCCGCG CAAACCCTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAAA TTCCGCATCC 

101 ACGCGCTGCT GACACTGGTC ATCGTCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT TGTCAACGAC ATACTGGTCA AAAACTTCGG 
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201 CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGCCTGGGC GCGATGCTCG 

251 GACGTTTGGT CGAAACATCC GGCGGCGCAC AGTCGCTGGC GGACGCGCTG 

301 ATCCGGATGT TCGGCGAAAA ACGCGCACCG TTCGCGCTGG GCGTTGCCTC 

351 GCTGATTTTC GGCTTCCCGA TTTTCTTCGA TGCCGGACTA ATCGTCATGC 

4 01 TGCCCATCGT GTTCGCCACC GCACGGCGCA TGAAACAGGA CGTACTGCCC 

451 TTCGCGCTTG CCTCCATCGG CGCATTTTCC GTCATGCACG TCTTCCTGCC 

501 GCCCCATCCG GGCCCGATTG CCGCTTCCGA ATTTTACGGC GCGAACATCG 

551 GCCAAGTTTT GATTTTGGGT CTGCCGACCG CCTTCATCAC ATGGTATTTC 

601 AGCGGCTATA TGCTCGGCAA AGTGTTGGGG CGCACCATCC ATGTTCCCGT 

651 TCCCGAACTG CTCAGCGGCG GCACGCAAGA CAACGACCTG CCGAAAGAAC 

701 CTGCCAAAGC AGGAACGGTC GTCGCCATCA TGCTGATTCC CATGCTGCTG 

751 ATTTTCCTGA ATACCGGCGT ATCGGCCCTC ATCAGCGAAA AACTCGTAAG 

801 TGCGGACGAA ACCTGGGTTC AGACGGCAAA AATAATCGGT TCGACACCGA 

851 TCGCCCTTCT GATTTCCGTA TTGGTCGCAC TGTTTGTCTT GGGACGCAAA 

901 CGCGGCGAAA GCGGCAGCGC GTTGGAAAAA ACCGTGGACG GCGCACTCGC 

951 CCCCGTCTGT TCCGTGATTC TGATTACCGG CGCGGGCGGT ATGTTCGGCG 

1001 GCGTTTTGCG CGCTTCCGGC ATCGGCAAGG CACTCGCCGA CAGCATGGCG 

1051 GATTTGGGCA TTCCCGTCCT TTTGGGCTGT TTCCTTGTCG CCTTGGCACT 

1101 GCGTATCGCG CAAGGTTCGG CAACCGTCGC CCTGACCACC GCCGCCGCGC 

1151 TGATGGCTCC TGCCGTTGCC GCCGCCGGCT TTACCGACTG GCAGCTCGCC 

1201 TGTATCGTAT TGGCAACGGC GGCAGGTTCG GTCGGTTGCA GCCACTTCAA 

1251 CGACTCCGGC TTCTGGCTGG TCGGCCGTCT CTTGGACATG GACGTACCGA 

1301 CCACGCTGAA AACCTGGACG GTCAACCAAA CCCTCATCGC ACTCATCGGC 

1351 TTTGCCTTGT CCGCACTGCT GTTCGCCATC GTCTGA 

This corresponds to the amino acid sequence <SEQ ID 586; ORF140-1>: 



1 MDGWTQTLSA QTLLGISAAA IILILILIVK FRIHALLTLV IVSLLTALAT 

51 GLPTGSIVND ILVKNFGGTL GGVALLVGLG AMLGRLV ETS GGAQSLADAL 

101 IRMFGEKRAP FALGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 

151 FALASIGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RTIHVPVPEL LSGGTQDNDL PKEPA KAGTV VAIMLIPMLL 

251 IFLNTGVSAL ISEKLVSADE TWVQTAKIIG S TPIALLISV LVALFVLG RK 

301 RGESGSALEK TVDGALAPVC SVILITGAGG MFGGVL RASG IGKALADSMA 

351 DLG IPVLLGC FLVALALRIA QGSAT VALTT AAALMAPAVA AA GFTDWQLA 

401 CIVLATAAGS VGCSHFNDSG FWLVGRLLDM DVPTTLKTWT VNQT LIALIG 

451 FALSALLFAI V * 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N.menimitidis (strain A) 

ORF140 shows 95.4% identity over a 87aa overlap with an ORF (ORF140a) from strain A of M 
meningitidis: 

10 20 30 40 50 60 

orf 140. pep MDGWTQTLSAQTLLGISAAAIILILILIVRFRIHALLTLVIV5LLTALATG LPTGSIVKD 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I: I 
orfl40a MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATG LPTGSIVND 

10 20 30 40 50 60 



70 80 
orf 1 4 0 . pep I LVKN FGGTL GG VALL VGLGAMLERLV 
: I I I I I I I I I I I I I I I I I I I I II III 
orf 140a VLVKNFGGTL GGVALLVGLGAMLGRLV ETSGGAQSLADALIRMFGEKRAPFALGVASLIF 

70 80 90 100 110 120 

The complete length ORF140a nucleotide sequence <SEQ ID 587> is: 

1 ATGGACGGCT GGACACAGAC GCTGTCCGCG CAAACCCTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAAA TTCCGCATCC 

101 ACGCGCTGCT GACACTGGTC ATCGTCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT TGTCAACGAC GTACTGGTCA AAAACTTCGG 

201 CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGCCTGGGC GCGATGCTCG 

251 GACGTTTGGT CGAAACATCC GGCGGCGCAC AGTCGCTGGC GGACGCGCTG 

301 ATCCGGATGT TCGGCGAAAA ACGCGCACCG TTCGCGCTGG GCGTTGCCTC 

351 GCTGATTTTC GGCTTCCCGA TTTTCTTCGA TGCCGGACTA ATCGTCATGC 

401 TGCCCATCGT GTTCGCCACC GCACGGCGCA TGAAACAGGA CGTACTGCCC 

451 TTCGCGCTTG CCTCCATCGG CGCATTTTCC GTCATGCACG TCTTCCTGCC 
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501 GCCCCATCCG GGCCCGATTG CCGCTTCCGA ATTTTACGGC GCGAACATCG 

551 GCCAAGTTTT GATTTTGGGT CTGCCGACCG CCTTCATCAC ATGGTATTTC 

601 AGCGGCTATA TGCTCGGCAA AGTGTTGGGG CGCACCATCC ATGTTCCCGT 

651 TCCCGAACTG CTCAGCGGCG GCACGCAAGA CAACGACCTG CCGAAAGAAC 

701 CTGCCAAAGC AGGAACGGTC GTCGCCATCA TGCTGATTCC CATGCTGCTG 

751 ATTTTCCTGA ATACCGGCGT ATCGGCCCTC ATCAGCGAAA AACTCGTAAG 

801 TGCGGACGAA ACCTGGGTTC AGACGGCAAA AATAATCGGT TCGACACCGA 

851 TCGCCCTTCT GATTTCCGTA TTGGTCGCAC TGTTTGTCTT GGGACGCAAA 

901 CGCGGCGAAA GCGGCAGCGC GTTGGAAAAA ACCGTGGACG GCGCACTCGC 

951 CCCCGTCTGT TCCGTGATTC TGATTACCGG CGCGGGCGGT ATGTTCGGCG 

1001 GCGTTTTGCG CGCTTCCGGC ATCGGCAAGG CACTCGCCGA CAGCATGGCG 

1051 GATTTGGGCA TTCCCGTCCT TTTGGGCTGT TTCCTTGTCG CCTTGGCACT 

1101 GCGTATCGCG CAAGGTTCGG CAACCGTCGC CCTGACCACC GCCGCCGCGC 

1151 TGATGGCTCC TGCCGTTGCC GCCGCCGGCT TTACCGACTG GCAGCTCGCC 

1201 TGTATCGTAT TGGCAACGGC GGCAGGTTCG GTCGGTTGCA GCCACTTCAA 

1251 CGACTCCGGC TTCTGGCTGG TCGGCCGCCT CTTGGACATG GACGTACCGA 

1301 CCACGCTGAA AACCTGGACG GTCAACCAAA CCCTCATCGC ACTCATCGGC 

1351 TTTGCCTTGT CCGCACTGCT GTTCGCCATC GTCTGA 

This encodes a protein having amino acid sequence <SEQ ID 588>: 

1 MDGWTQTLSA QTLLGISAAA IILILILIVK FRIHALLTLV IVSLLTALAT 

51 GLPTGSIVND VLVKNFGGTL GGVALLVGLG AMLGRLV ETS GGAQSLADAL 

101 IRMFGEKRAP FALGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 

151 FALASIGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RTIHVPVPEL LSGGTQDNDL PKEPA KAGTV VAIMLIPMLL 

251 IFLNTGVSAL ISEKLVSADE TWVQTAKIIG S TPIALLISV LVALFVLG RK 

301 RGESGSALEK TVDGALAPVC SVILITGAGG MFGGVL RASG IGKALADSMA 

351 DLG IPVLLGC FLVALALRIA QGSAT VALTT AAALMAPAVA AA GFTDWQLA 

401 CIVLATAAGS VGCSHFNDSG FWLVGRLLDM DVPTTLKTWT VNQT LIALIG 

451 FALSALLFAI V * 

ORF140a and ORF140-1 show 99.8% identity over a 461aa overlap: 

orf 140-1. pep MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATGLPTGSIVND 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I i I I I 
orfl40a MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATGLPTGSIVND 60 

orf 140-1 .pep I LVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAP FALGVASLIF 120 

: I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl40a VLVKN FGGTLGGVALLVGLGAMLGRLVET SGGAQS LADALI RMFGEKRAPFALGVAS L I F 120 

orf 140-1 .pep GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASIGAFSVMHVFLPPHPGPIAASEFYG 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 11 I 
orf 140a GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASIGAFSVMHVFLPPHPGPIAASEFYG 810 

orf 140-1 .pep ANIGQVLILGLPTAFITWYFSGYMLGKVLGRTIHVPVPELLSGGTQDNDLPKEPAKAGTV 240 

I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 
orf 140a ANIGQVLILGLPTAFITWYFSGYMLGKVLGRTIHVPVPELLSGGTQDNDLPKEPAKAGTV 240 

orf 140-1 .pep VAIMLIPMLLIFLNTGVSALISEKLVSADETWVQTAKIIGSTPIALLISVLVALFVLGRK 300 

I I II I Ml I I I I I I I I I I II I I I I I I i I I I I I I I 1 I I I M I I I I I i 11 I I I II I I I I 1 I I 
o r f 1 4 0a VAIMLI PMLL I FLNTGVSAL I SEKLVSADETWVQTAKI IGSTPIALLI S VLVALFVLGRK 300 

orf 14 0-1 .pep RGESGSALEKTVDGALAPVCSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGC 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 140a RGESGSALEKTVDGALAPVCSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGC 360 

orf 140-1 .pep FLVALALRIAQGSATVALTTAAALMAPAVAAAGFTDWQLACIVLATAAGSVGCSHFNDSG 420 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
orf 14 0a FLVALALRIAQGSATVALTTAAALMAPAVAAAGFTDWQLACIVLATAAGSVGCSHFNDSG 420 

orf 140-1 .pep FWLVGRLLDMDVPTTLKTWTVNQTLIALIGFALSALLFAIV 461 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
or f 14 0a FWLVGRLLDM DVPTTLKTWT VNQTL I AL I GFALS ALL FAIV 4 61 

Homology with a predicted ORF from N.zonorrhoeae 

ORF140 shows 92% identity over a 87aa overlap with a predicted ORF (ORF140ng) from 
N. gonorrhoeae: 
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orfl40.pep MDGWTQTLSAQTLLGISAAAIILILILIVRFRIHALLTLVIVSLLTALATGLPTGSIVKD 60 

ill I I i ! I I I I I I I I I I I I I I I I I I I I I : I I I : I I I I I I I : I I I I I I I I I I I I I I I I : I 
orfl40ng MDGRTQTLSAQTLLGISAAAIILILILIVKFRIRALLTLVIASLLTALATGLPTGSIVND 60 

or f 1 4 0 . pep I LVKN FGGTLGG VALLVGLGAMLERLV 87 

: I II I I I I I I I I I I I I I I I I I I I Ml 
orfl40ng VLVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFAPGVASLIF 120 

The complete length ORF140ng nucleotide sequence <SEQ ID 589> was predicted to encode a 
protein having amino acid sequence <SEQ ID 590>: 



1 MDGRTOTLSA OTLLGISAAA IILILILIVK FRIRALLTLV IASLLTALAT 

51 GLPTGSIVND VLVKNFGGTL GGVALLVGLG AMLGRLV ETS GGAQSLADAL 

101 IRMFGEKRAP FAPGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 

151 FALASVGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RAIHVPVPEL LSGGTQDSDP PKEPA KAGTV VAVMLIPMLL 

251 IFLNTGVSAL ISEKLVSADE TWVQTAKMIG S TPVALLISV LAALLVLG RK 

301 RGESGSTLEK TVDGALAPA C SVILITGAGG MFGGVL RASG IGKALADSMA 

351 DLG IPVLLGC FLVALALRIA QGSAT VALTT AAALMAPAVA AA GFTDWQLA 

401 CIVLATAAGS VGCSHFNDSG FWLVGRLSDM DVPTTLKTWT VNQT LIAFIG 

451 FALSALLFAI V * 

Further work revealed a variant gonococcal DNA sequence <SEQ ID 59 1>: 



1 ATGGACGGCC GGACACAGAC GCTGTCCGCG CAAACCTTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAAA TTCCGCATCC 

101 GCGCGCTGCT GACACTGGTC ATCGCCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT CGTCAACGAC GTACTGGTCA AAAACTTCGG 

201 CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGTCTGGGC GCAATGCTCG 

251 GACGTTTGGT AGAAACATCC GGCGGCGCAC AGTCGCTGGC GGACGCGCTG 

301 ATCCGGATGT TCGGCGAAAA ACGCGCACCG TTCGCTCCGG GCGTTGCCTC 

351 GCTGATTTTC GGCTTCCCGA TTTTCTTCGA TGCCGGACTA ATCGTCATGC 

401 TGCCCATCGT ATTCGCCACC GCACGGCGCA TGAAACAGGA CGTACTGCCC 

451 TTCGCGCTTG . CCTCCGTCGG CGCATTTTCC GTCATGCACG TCTTCCTGCC 

501 GCCCCATCCG GGCCCGATTG CCGCTTCCGA ATTTTACGGC GCGAACATCG 

551 GCCAGGTTTT GATTTTGGGT CTGCCGACCG CCTTCATCAC ATGGTATTTC 

601 AGCGGCTATA TGCTCGGCAA AGTGTTGGGG CGCGCCATCC ATGTTCCCGT 

651 TCCCGAACTG CTCAGCGGCG GCACGCAAGA CAGCGACCCG CCGAAAGAAC 

701 CTGCCAAAGC AGGAACGGTC GTCGCCGTCA TGCTGATTCC CATGCTGCTG 

751 ATTTTCCTGA ATACCGGCGT ATCAGCCCTC ATCAGCGAAA AACTCGTAAG 

801 TGCGGACGAA ACTTGGGTTC AGACGGCAAA AATGATCGGT TCGACACCTG 

851 TCGCCCTTCT GATTTCCGTA TTGGCCGCAC TGTTGGTCTT GGGACGCAAA 

901 CGCGGCGAAA GCGGCAGCAC GTTGGAAAAA ACCGTGGACG GCGCACTCGC 

951 CCCCGCCTGT TCCGTGATTC TGATTACCGG CGCGGGCGGT ATGTTCGGCG 

1001 GCGTTTTGCG CGCTTCCGGC ATCGGCAAGG CACTCGCCGA CAGCATGGCG 

1051 GATTTGGGCA TTCCCGTCCT TTTGGGCTGC TTCCTTGTCG CCTTGGCACT 

1101 GCGTATCGCG CAAGGTTCGG CAACCGTCGC CCTGACCACA GCCGCCGCGC 

1151 TGATGGCTCC TGCCGTTGCC GCCGCCGGCT TTACCGACTG GCAGCTCGCC 

1201 TGTATCGTAT TGGCAACGGC GGCAGGTTCG GTCGGTTGCA GCCACTTCAA 

1251 CGACTCCGGC TTCTGGCTGG TCGGCCGCCT CTTGGATATG GACGTACCGA 

1301 CCACGCTGAA AACCTGGACG GTCAACCAAA CCCTCATCGC ATTCATCGGC 

1351 TTTGCCTTGT CCGCACTGCT GTTTGCCATC GTCTGA 

This corresponds to the amino acid sequence <SEQ ID 592; ORF140ng-l>: 



1 MDGRTQTLSA QTLLGISAAA IILILILIVK FRIRALLTLV IASLLTALAT 

51 GLPTGSIVND VLVKNFGGTL GGVALLVGLG AMLGRLV ETS GGAQSLADAL 

101 IRMFGEKRAP FAPGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 

151 FALASVGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RAIHVPVPEL LSGGTQDSDP PKEPA KAGTV VAVMLIPMLL 

251 IFLNTGVSAL ISEKLVSADE TWVQTAKMIG S TPVALLISV LAALLVLG RK 

301 RGESGSTLEK TVDGALAPAC SVILITGAGG MFGGVL RASG IGKALADSMA 

351 DL GIPVLLGC FLVALALRIA QGSAT VALTT AAALMAPAVA AA GFTDWQLA 

401 CIVLATAAGS VGCSHFNDSG FWLVGRLLDM DVPTTLKTWT VNQT LIAFIG 

451 FALSALLFAI V * 

ORF140ng-l and ORF140-1 show 96.3% identity over 461aa overlap: 



orfl40ng-l.pep MDGRTQTLSAQTLLGISAAAIILILILIVKFRIRALLTLVIASLLTALATGLPTGSIVND 
Ml M I M II I I I I I [ I i I I | I | M | | I II I I : I I I I I M : I I I I M I I I I I I II I II I 
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or f 1 4 0- 1 MDGWTQTLSAQTLLGI S AAAI ILI LI LI VKFRIHALLTLVI VSLLTALATGLPTGS I VND 

orfl40ng-l.pep VLVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFAPGVASLIF 
: I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
5 orf 14 0-1 ILVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFALGVASLIF 

orfl40ng-l.pep GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASVGAFSVMHVFLPPHPGPIAASEFYG 
I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I : II I I I I I I I ! I I I I I I I I I I I I I I 
Orfl4 0-1 GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASIGAFSVMHVFLPPHPGPIAASEFYG 

10 

orfl40ng-l.pep ANIGQVLILGLPTAFITWYFSGYMLGKVLGRAIHVPVPELLSGGTQDSDPPKEPAKAGTV 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I : I II I I I I I I I I 
orf 140-1 ANIGQVLILGLPTAFITWYFSGYMLGKVLGRTIHVPVPELLSGGTQDNDLPKEPAKAGTV 

15 orf 14 0ng-l .pep VAVMLI PMLLI FLNTGVS ALI SEKLVSADETWVQTAKMIGSTPVALLISVLAALLVLGRK 

I I : I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I : I I I I I : I I I II I I : I I : I I I I I 
orf 140-1 VAIMLI PMLLI FLNTGVSALI SEKLVSADETWVQTAKI IGST PI ALLI SVLVALFVLGRK 



orf 14 Ong-l . pep RGESGSTLEKTVDGALAPACSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGC 
20 I I I I I I : I I I I I I I I 1 I I : I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I II I I I I 

orf 140-1 RGESGSALEKTVDGALAPVCSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGC 



orf 140ng-l .pep FLVALALRIAQGSATVALTTAAALMAPAVAAAGFTDWQLACIVLATAAGSVGCSHFNDSG 
I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I 
25 orfl40-l FLVALALRIAQGSATVALTTAAALMAPAVAAAGFTDWQLACIVLATAAGSVGCSHFNDSG 



orf 14 0ng-l .pep FWLVGRLLDMDVPTTLKTWTVNQTLIAFIGFALSALLFAIV 
I I I I I I I I I I I I I I I I I I I II I I I I I I : I I I I I I I I I I I I I 
orf 140-1 FWLVGRLLDMDVPTTLKTWTVNQTLIALIGFALSALLFAIV 

30 Furthermore, ORF140ng-l is homologous to an E.coli protein: 

gi 1 882633 (U29579) 0RF_o454 [Escherichia coli] >gi 11789097 (AE000358) o454; 
This 454 aa ORF is 34% identical (9 gaps) to 444 residues of an approx. 456 aa 
protein GNTP_BACLI SW: P4 6832 [Escherichia coli] Length - 454 
Score - 210 bits (529), Expect = le-53 
35 Identities = 130/384 (33%), Positives = 194/384 (49%), Gaps = 19/384 (4%) 
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Query: 


88 


Sbjct: 


80 


Query: 


148 


Sbjct: 


140 


Query: 


208 


Sbjct: 


199 


Query: 


258 


Sbjct: 


256 


Query: 


318 


Sbjct: 


313 


Query: 


378 


Sbjct: 


371 


Query: 


438 


Sbjct: 


431 



ETSGGAQSLADALIRMFGEKRAPFAPGVASLIFGFPIFFDAGLIVMLPIVFATARRMKQD 147 
E SGGA+SLA+ R G+KR A +A+ G P+FFD G I++ PI++ A+ K 
EHSGGAESLANYFSRKLGDKRTIAALTLAAFFLGIPVFFDVGFIILAPIIYGFAKVAKIS 139 



L F L 



+HV +PPHPGP+AA+ 



A+IG + I+G+ + 



I GY K 

• I PVGWGYFAAK 198 



VLGRAIHVPVPELL- 
++ + + E+L 



-SGGTQDSDPPKEPAKAGTWAVMLIPMLLIFLNTGV 257 

G T+ SD P A V ++++IP+ +1 T 
SEGATKLSDKINPPGVA-LVTSLIVIPIAIIMAGT-- 255 



+S L+ + T ++IGS +RG S + AL 

— VSATLMPPSHPLLGTLQLIGSPMVALMIALVLAFWLLALRRGWSLQHTSDIMGSALP 312 



A VIL+TGAGG+FG VL SG+GKALA+ + + +P+L F+++LALR +QGS 



+ LA G +G SH NDSGFW+V + L + V 



-AT 370 



LK 



437 
430 



TWTVNQTLIAFIGFALSALLFAIV 4 61 
TWTV T++ F GF ++ ++A++ 
TWTVLTTILGFTGFLITWCVWAVI 454 



Based on this analysis, including the identification of the presence of a putative leader sequence 



65 (double-underlined) and several putative transmembrane domains (single-underlined) in the 
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gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 71 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 593>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 



. GATTTCGGCA 
TTTGCTGTCG 
GCGTATTTTT 
AACTTTTTGG 
CTGTATCGGG 
CCTTTGCCGC 
CGCGTGATTG 
GTTGGCAGCA 
TACTGATGTT 



TATCGCCCGT 
CCGTGGGCTG 
TGCCGTTATC 
GCAGACACCA 
CTGATTCCAG 
CGCCGGACTG 
CCGCCTCTTT 
GCTTATCCGG 
TTTCCGTCCG 



GTATCTTTGG 
CCGACTCATA 
GGACTGACTT 
CGGGCGCAC. 
TTGCCCATTT 
GTGCTGCACG 
TCTGCTCGGT 
CAGCATTTGC 



GTTGCCGCCG 
CGATGTCGCA 
CCTGCGGCTT 
GTCGTCCTGA 
CCTCAACCCC 
GTTATTCTTT 
ACGGGCTGGA 
CCTGATGCTG 



CGTTCAAACA 
CGCTTTGCAG 
TGCCGGTTTC 
TTCTCATCGG 
GCTGCCGCCG 
GGCTCGCCGG 
CGCTGATGTC 
CCCTTGCCCG 



This corresponds to the amino acid sequence <SEQ ID 594; ORF141>: 

1 . . DFGISPVYLW VAAAFKHLLS PWAADSYDVA RFAGVFFAVI GLTSCGFAGF 
51 NFLGRHHGRX WLILIGCIG LIPVAHFLNP AAAAFAAAGL VLHGYSLARR 
101 RVIAASFLLG TGWTLMSLAA AYPAAFALML PLPVLMFFRP . . 

Further work revealed the complete nucleotide sequence <SEQ ID 595>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 



ATGCTGACCT 
AAAGCCGTGG 
TGTTTTCCCA 
GTCGAAGCAC 
TCAAACCGAT 
TCAAACATTT 
TTTGCAGGCG 
CGGTTTCAAC 
TCATCGGCTG 
GCCGCCGCCT 
TCGCCGGCGC 
TGATGTCGTT 
TTGCCCGTAC 
GACGGCAGTC 
CGCTGCTCTT 
TATCACGTTT 
CAGTTTGTTT 
TGCCGCTGGC 
TGGGGGATTT 
CGTCAATCCG 
TTGCCCTGTT 
GCGTTTGTCA 
CCTGTGGACG 
CCGAACGCGC 
ATTCCGATGG 
TACCCGGAAA 
GCGTTACCCT 
GACGCGGCGA 
TTCCCCGGAA 
TAGGCGGCGG 
TTGCCGCACC 
GCCCCAAAAT 
CGCGTCCGCG 
GAAAATATAT 



ATACCCCGCC 
CTGCTGCTGT 
CGATTTGTGG 
TGGCAGGCAG 
TTCGGCATAC 
GCTGTCGCCG 
TATTTTTTGC 
TTTTTGGGCA 
TATCGGGCTG 
TTGCCGCCGC 
GTGATTGCCG 
GGCAGCAGCT 
TGATGTTTTT 
GCCTCACTTG 
GGCAAAAACG 
TCGGTACGTT 
TACTATCTGA 
GGTTTGGACG 
TGGGCGTCGT 
CAGCGTTTTC 
CGGCGCGGCG 
ACTGGTTCGG 
GGCTTTTTCG 
CGCCTATTTC 
CGGTTGCCGT 
AACATACGCG 
GACCTGGGCT 
AAAGCCACGC 
TTGAAACGGG 
CGACCTGCAC 
GCGTCGGCGA 
GCGGATGCGC 
CAACAAAGAC 
AA 



CGATGCCCGC 
TGATGGCGTT 
AATCCTGACG 
CCCCACCCCC 
CGCCCGTGTA 
TGGGCTGCCG 
CGTTATCGGA 
GACACCACGG 
ATTCCAGTTG 
CGGACTGGTG 
CCTCTTTTCT 
TATCCGGCAG 
CCGTCCGTGG 
CCTTTGCCCT 
CAGCCCGCGC 
CGGCGGCGTG 
AAAACCTGCT 
GTTTGCCGCA 
CTGGATGCTT 
AGGATAACCT 
CAACTGGACA 
CATTATGGCG 
CCATGAATTA 
AGCCCGTATT 
ACTGTTCACA 
GCAGGCAGGC 
TTGCTGATGA 
GCCGGTCGTC 
AGCTTTCAGA 
ACGCGGATTG 
TGTACAATGC 
CGCAAGGCTG 
AGTAAGTTCG 



CCGCCCGCCA 
TGCCTGGTTG 
AACCTGCCGT 
TTGGTTGCCC 
TCTTTGGGTT 
ACTCATACGA 
CTGACTTCCT 
GCGCAgCGTC 
CCCATTTCCT 
CTGCACGGTT 
GCTCGGTACG 
CATTTGCCCT 
CAAAGCAGGC 
GCCGCTTATG 
TGTTCGCGCA 
CGGCACGTTC 
TTGGTTTGCA 
CGCGCCTGTT 
GCCGTTTTGG 
CGTCTGGCTG 
GCCTGAGGCG 
TTCGGACTGT 
CGGCTGGCCC 
ATGTTCCTGA 
CCCTTGTGGC 
GGTTACCAAC 
CGCTGTTCCT 
CGGAGTATGG 
CGGCATCGAG 
TTTGGACGCA 
CGCTACCGCA 
GCAGACGGTT 
CACTGATACG 



AAACCCACGA 
TGGCCCGGCG 
CTATACCGCC 
ATCTGTTCGG 
GCCGCCGCGT 
TGCCGCACGC 
GCGGCTTTGC 
GTCCTGATTC 
CAACCCCGCT 
ATTCTTTGGC 
GGCTGGACGC 
GATGCTGCCC 
GTTTGATGTT 
ACCGTTTACC 
ATGGCTCGAC 
AGACGGCATT 
TTGCCCGCGC 
TTCGACCGAC 
TGCTGCTTGC 
CTTCCGCCGC 
CGGCGCGGCG 
TTGCCGTGTT 
GCCAAGCTTG 
TATCGATCCC 
TGTGGGCGAT 
TGGGCGGCAG 
GCCGTGGCTG 
AGGCATCGCT 
TGTATCGGCA 
GTACGGCACA 
TCGT.CCTCCT 
TGGCAGGGTG 
GAAAATCGGG 



This corresponds to the amino acid sequence <SEQ ID 596; ORF141-l>: 



l 

51 
101 
151 
201 



MLTYTPPDAR PPAKTHEKPW LLLLMAFAWL WPGVFSHDLW 



VEALAGSPTP LVAHLFGQTD 
FAGVFFAVIG LTSCGFA GFN 
AAAFAAAGLV LHGYSLARRR 
LPVLMFFRPW QSRRLMLTAV 



NPDEPAVYTA 
WAADSYDAAR 
IPVAHFLNPA 



FGIPPVYLWV AAAFKHLLSP 
FLGRHHGRS V VLILIGCIGL 
VIAASFLLGT GWTLMSL AAA YPAAFALMLP 
ASLAFALPLM TVYPLLLAKT QPALFAQWLD 
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251 YHVFGTFGGV RHVQTAFSLF YYLKNLLWFA LPALPLAVWT VCRTRLFSTD 

301 W GILGWWML AVLVLLAV NP QRFQDNLVWL LPPLALFGAA QLDSLRRGAA 

351 AFVNWFGIMA FGLFAVFLWT GFFAMNYGWP AKLAERAAYF SPYYVPDIDP 

401 IPMAVAVLFT PLWLWAI TRK NIRGRQAVTN WAAGVTLTWA LLMTLFL PWL 

451 DAAKSHAPVV RSMEASLSPE LKRELSDGIE CIGIGGGDLH TRIVWTQYGT 

501 LPHRVGDVQC RYRIVLLPQN ADAPQGWQTV WQGARPRNKD SKFALIRKIG 

551 ENI* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF141 shows 95.0% identity over a 140aa overlap with an ORF (ORF141a) from strain A of N. 
meningitidis: 

10 20 30 

orf 141 . pep DFGIS PVYLWVAAAFKHLLSPWAADSYDVA 

MM MMMMMMMMMM MM 
orf 14 la WN P DE P AV YT AVE ALAG S PT PLV AHL FGQ I D FG I P PVY LW V AAAFKHLL S PW AAD P Y DAA 

40 50 60 70 80 90 



40 50 60 70 80 90 

or f 14 1 . pep R FAGVFFAVIGLTSCGFA GFNFLGRHHGRX WLILIGCIGLIPVAHF LNPAAAAFAAAGL 
M M M M I M M I M M M M M M M I I M M M M I M i : M I I I I I I I M M M I 
orf!41a R FAGVFFAWGLTSCGFA GFNFLGRHHGRS WLILIGCIGLIPTVHF LNPAAAAFAAAGL 
100 110 120 130 140 150 



100 110 120 130 140 

orf 141 . pep VLHGYSLARRR VI7VASFLLGTGWTLMSL AA AYPAAFALMLPLPVLMFF RP 
M M M M M M M M M M M M M M M M M M M M M M M M M 
orf 141a VLHGYSLARR RVIAASFLLGTGWTLMSL AA AYPAAFALMLPLPVLMFF RPWQSRRL MLTA 
160 170 180 190 200 210 



orf 141a VASLAFALPLMTV YPLLLAKTQPALFAQWLDDHVFGTFGGVRHIQTAFSLFYYLKKLLWF 
220 230 240 250 260 270 

The complete length ORF141a nucleotide sequence <SEQ ED 597> is: 



1 ATGCTGACCT ATACCCCGCC 

51 AAAGCCGTGG CTGTTGCTGT 

101 TGTTTTCCCA CGATTTGTGG 

151 GTCGAAGCAC TGGCAGGCAG 

201 TCAAATCGAT TTCGGCATAC 

251 TCAAACATTT GCTGTCGCCG 

301 TTTGCCGGCG TGTTTTTCGC 

351 CGGTTTCAAC TTTTTGGGCA 

401 TCATCGGCTG TATCGGGCTG 

451 GCCGCCGCCT TTGCCGCCGC 

501 TCGCCGGCGC GTGATTGCCG 

551 TGATGTCGTT GGCAGCAGCT 

601 CTGCCCGTGC TGATGTTTTT 

651 GACGGCAGTC GCCTCGCTTG 

701 CGCTGCTCTT GGCAAAAACG 

751 GATCACGTTT TCGGTACGTT 

801 CAGTTTGTTT TACTATCTGA 

851 TGCCGCTGGC GGTTTGGACG 

901 TGGGGGATTT TGGGCGTCGT 

951 CGTCAATCCG CAGCGTTTTC 

1001 TTGCCCTGTT CGGCGCGGCG 

1051 GCGTTTGTCA ACTGGTTCGG 

1101 CCTGTGGACG GGCTTTTTCG 

1151 CCGAACGCGC CGCCTATTTC 

1201 ATTCCGATGG CGGTTGCCGT 

1251 TACCCGCAAA AACATACGCG 

1301 GCGTTACCCT GACCTGGGCT 

1351 GACGCGGCGA AAAGCCACGC 

1401 TTCCCCGGAA TTAAAACGGG 

1451 TAGGCGGCGG CGACCTACAC 

1501 TTGCCGCACC GCGTCGGCGA 

1551 GCCCCAAAAC GCGGATGCGC 



CGATGCCCGC CCGCCCGCCA AAACCCACGA 
TGATGGCGTT TGCCTGGTTG TGGCCCGGCG 
AATCCTGACG AACCTGCCGT CTATACCGCC 
CCCCACCCCT TTGGTTGCCC ATCTGTTCGG 
CGCCCGTGTA TCTTTGGGTT GCCGCCGCGT 
TGGGCTGCCG ACCCGTATGA TGCCGCACGC 
CGTTGTCGGA CTGACTTCCT GCGGCTTTGC 
GACACCACGG GCGCAGCGTC GTCCTGATTC 
ATTCCGACCG TACACTTTCT CAACCCCGCT 
CGGACTGGTG CTGCACGGTT ATTCTTTGGC 
CCTCTTTTCT GCTCGGTACG GGTTGGACGC 
TATCCGGCGG CATTTGCCCT GATGCTGCCC 
CCGTCCGTGG CAAAGCAGGC GTTTGATGTT 
CCTTTGCCCT GCCGCTTATG ACCGTTTACC 
CAGCCCGCGC TGTTCGCGCA ATGGCTCGAC 
CGGCGGCGTG CGGCACATTC AGACGGCATT 
AAAACCTGCT TTGGTTTGCA TTGCCTGCGC 
GTTTGCCGCA CGCGCCTGTT TTCGACCGAC 
CTGGATGCTT GCCGTTTTGG TGCTGCTTGC 
AGGATAACCT CGTCTGGCTG CTTCCGCCGC 
CAACTGGACA kCCTGAGACG CGGCGCGGCG 
CATTATGGCG TTCGGACTGT TTGCCGTGTT 
CCATGAATTA CGGCTGGCCC GCCAAGCTTG 
AGCCCGTATT ATGTTCCTGA TATCGATCCC 
ACTGTTCACA CCCTTGTGGC TGTGGGCGAT 
GCAGGCAGGC GGTTACCAAC TGGGCGGCAG 
TTGCTGATGA CGCTGTTCCT GCCGTGGCTG 
GCCCGTCGTC CGGAGTATGG AGGCATCGCT 
AGCTTTCAGA CGGCATCGAG TGTATCGACA 
ACGCGGATTG TTTGGACGCA GTACGGCACA 
TGTACAATGC CGCTACCGCA TCGTCCGCTT 
CGCAAGGCTG GCAGACGGTC TGGCAGGGTG 
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1601 CGCGCCCGCG CAACAAAGAC AGTAAGTTCG CACTGATACG GAAAACCGGG 
1651 GAAAATATAT TAAAAACAAC AGATTGA 

This encodes a protein having amino acid sequence <SEQ ID 598>: 



1 MLTYTPPDAR PPAKTHEKPW LLLLMAFAWL WPGVFS HDLW NPDEPAVYTA 

51 VEALAGSPTP LVAHLFGQID FGIPPVYLWV AAAFKHLLSP WAADPYDAAR 

101 FAGVFFAWG LTSCGFA GFN FLGRHHGRS V VLILIGCIGL IPTVHF LNPA 

151 AAAFAAAGLV LHGYSLARRR VIAASFLLGT GWTLMSLA AA YPAAFALMLP 

201 LPVLMFF RPW QSRRL MLTAV ASLAFALPLM TV YPLLLAKT QPALFAQWLD 

251 DHVFGTFGGV RHIQTAFSLF YYLKNLLWFA LPALPLAVWT VCRTRLFSTD 

301 W GILGWWML AVLVLLAV NP QRFQDNLVWL LPPLALFGAA QLDSLRRGAA 

351 AFVNWFGIMA FGLFAVFLWT GFFAMNYGWP AKLAERAAYF SPYYVPDIDP 

401 I PMAVAVLFT PLWLWAI TRK NIRGRQAVTN WAAGVTLTWA LLMTLFL PWL 

451 DAAKSHAPW RSMEASLSPE LKRELSDGIE CIDIGGGDLH TRIVWTQYGT 

501 LPHRVGDVQC RYRIVRLPQN ADAPQGWQTV WQGARPRNKD SKFALIRKTG 

551 ENILKTTD* 

ORFHla and ORF141-1 show 98.2% identity in 553 aa overlap: 



or f 14 la . pep MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPDEPAVYTAVEALAGSPTP 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I M I I I 
or f 14 1-1 MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPDEPAVYTAVEALAGSPTP 



or f 14 la. pep LVAHLFGQIDFGIPPVYLWVAAAFKHLLSPWAADPYDAARFAGVFFAWG LTSCGFAGFN 
I I I I I I I I I I I I I I I I I I I I I I I 1 I l I I I I I I I I I I I I I I I ! II I I : I I I I I I I I I I i 
orf 141-1 LVAHLFGQTDFGIPPVYLWVAAAFKHLLSPWAADSYDAARFAGVFFAVIGLTSCGFAGFN 

orfl41a.pep FLGRHHGRSWLILIGCIGLIPTVHFLNPAAAAFAAAGLVLHGYSLARRRVIAASFLLGT 
I [ I I I 1 II I I I II I I I I I I I II : : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 141-1 FLGRHHGRSWLILIGCIGLIPVAHFLNPAAAAFAAAGLVLHGYSLARRRVIAASFLLGT 



or f 14 la . pep GWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTAVASLAFALPLMTVYPLLLAKT 
I I II I I I I I I I I II I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
orf 1 4 1-1 GWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTAVASLAFALPLMTVYPLLLAKT 



or f 14 la . pep QPALFAQWLDDHVFGTFGGVRHIQTAFSLFYYLKNLLWFALPALPLAVWTVCRTRLFSTD 
I I I I I II I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I II 
orf 14 1-1 QPALFAQWLDYHVFGTFGGVRHVQTAFSLFYYLKNLLWFALPALPLAVWTVCRTRLFSTD 



orf 14 la. pep WGILGVWMLAVLVLLAVNPQRFQDNLVWLLPPLALFGAAQLDSLRRGAAAFVNWFGIMA 
I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 141-1 WGILGVWMLAVLVLLAVNPQRFQDNLVWLLPPLALFGAAQLDSLRRGAAAFVNWFGIMA 



orf 141a. pep FGLFAVFLWTGFFAMNYGWPAKLAERAAYFS PYYVPD I DPI PMAVAVLFT PLWLWAITRK 
I I I I I I I II I II I I I II I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I 
orf 14 1-1 FGLFAVFLWTGFFAMNYGWPAKLAERAAYFS PYYVPDI DPI PMAVAVLFT PLWLWAITRK 



orf 141a . pep NIRGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKSHAPWRSMEASLSPE LKRELSDGIE 
I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I 
orf 141-1 NIRGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKSHAPWRSMEASLSPELKRELSDGIE 



orf 14 la . pep CIDIGGGDLHTRIVWTQYGTLPHRVGDVQCRYRIVRLPQNADAPQGWQTVWQGARPRNKD 
II I I I I I II I II I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
orf 14 1-1 CIGIGGGDLHTRIVWTQYGTLPHRVGDVQCRYRIVLLPQNADAPQGWQTVWQGARPRNKD 



orf 1 4 la . pep SKFALIRKTGENI 
I I I I I I I I I I I I 
orf 141-1 SKFALIRKIGENI 



Homology with a predicted ORF from ^gonorrhoeae 

ORF141 shows 95% identity over a 140aa overlap with a predicted ORF (ORF141ng) from 
N. gonorrhoeae: 



orf 14 1 . pep DFGISPVYLWVAAAFKHLLSPWAADSYDVA 30 

I I I I I I II I I I I I 1 I I I I I I I I I 11:1 
orfUlng WN PAE PAVYTAVEALAGS PT PLVAHLFGQTDFG I PPVYLWVAAAFKHLLS PWAAHPYDAA 126 
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orf 14 1 . pep R FAGVFFAVI GLT S CGFAG FN FLGRHHGRX WL ILIGCIGLI P VAH FLN PAAAAFAAAG L 90 

I I I I I I I I I I I I 1 I I I I I I M ! I I I I I I I MM M M M M M M : M M M M M M 
orfl41ng RFAGVFFAVIGLTSCGFAGFNFLGRHHGRSWLIHIGCIGLIPVAHFFNPAAAAFAAAGL 186 

orf 141 .pep VLHGYSLARRRVIAASFLLGTGWTLMSLAAAYPAAFALMLPLPVLMFFRP 140 

M I M M M M I M M M M M M M M M M M M M M M M M M M 
orfl41ng VLHGYSLARRRVIAASFLLGTGWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTA 24 6 

An ORF141ng nucleotide sequence <SEQ ID 599> was predicted to encode a protein having amino 
acid sequence <SEQ ID 600>: 



1 MPSEAVSARP LCEYLLHLAI RPFLLTLMLT YTPPDARPPA KTHEKP WLLL 

51 LMAFAWLWPG VFS HDLWNPA EPAVYTAVEA LAGSPTPLVA HLFGQTDFGI 

101 PPVYLWVAAA FKHLLSPWAA HPYDAA RFAG VFFAVIGLTS CGFA GFNFLG 

151 RHHGRS WLI HIGCIGLIPV AHF FNPAAAA FAAAGLVLHG YSLARRRVIA 

201 ASFLLGTGWT LMSL AA AYPA AFALMLPLPV LMFF RPWQSR RL MLTAVASL 

251 AFALPLMTV Y PLLLAKTQPA LFAQWLNYHV FGTFGGVRHI QRAFSLFHYL 

301 KNLLWFAPPG LPLAVWTVCR TRLFSTDW GI LGIVWMLAVL VLLAF NPQRF 

351 QDNLVWLLPP LALFGAAQLD SLRRGAAAFV NWFG IMAFGL FAVFLWTGFF 

401 AMNYGWPAKL AERAAYFSPY YVPDIDP IPM AVAVLFTPLW LWAI TRKNIR 

451 GRQAVTN WAA GVTLTWALLM TLFL PWLDAA KSHAPWRSM EASFSPELKR 

501 ELSDGIECIG IGGGDLHTRI VWTQYGTLPH RVGDVRCRYR IVRLPQNADA 

551 PQGWQTVWQG ARPRNKDSKF ALIRKIGENI LKTTD* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 601>: 



1 ATGCTGACCT ATACCCCGCC CGATGCCCGC CCGCCCGCCA AAACCCACGA 

51 AAAACCGTGG CTGCTGCTGT TGATGGCGTT TGCCTGGCTG TGGCCCGGCG 

101 TGTTTTCCCA CGATTTGTGG AATCCTGCCG AACCTGCCGT CTATACCGCC 

151 GTCGAAGCAC TGGCAGGCAG CCCCACCCCC TTGGTTGCCC ATCTGTTCGG 

201 TCAAACCGAT TTCGGCATAC CGCCCGTGTA TCTTTGGGTT GCCGCCGCAT 

251 TCAAACATTT GCTGTCGCCG TGGGCAGCCG ACCCGTATGA TGCCGCACGC 

301 TTTGCAGGCG TATTTTTTGC CGTTATCGGA CTGACTTCTT GCGGCTTTGC 

351 CGGTTTCAAC TTTTTGGGCA GACACCACGG GCGCAGCGTT GTTTTAATCC 

401 ATATCGGCTG TATCGGGCTG ATTCCGGTTG CCCATTTCCT CAATCCcgcc 

451 gccgccgcct tTGCCGCCGC CGGACTGGTG CTGCacggct actcgctgGC 

501 ACGCCGGCGC GTGATtgccg cctctTtccT GCTCGGTACG GGTTGGACGT 

551 TGATGTCGCT GGCGGCAGCT TATCCGGCGG CGTTTGCGCT GATGCTGCCC 

601 CTGCCCGTGC TGATGTTTTT CCGTCCGTGG CAAAGCAGGC GTTTGATGTT 

651 GACGGCAGTC GCCTCGCTTG CCTTTGCCCT GCCGCTTATG ACCGTTTACC 

701 CGCTGCTCtt gGCAAAAACG CAGCCCGCGC TGTTTGCGCA ATGGCTCAAC 

751 TATCACGTTT TCGGTACGTt cggcgGCGTG CGGCAcaTTC AGAggGCatT 

801 Cagtttgttt cactatctgA AAaatctgct ttggttcgca ccgcccgggC 

851 TGCCGCTGGC GGTTTGGACG GTTTGCCGCA CACGCCTGTT TTCGACCGAC. 

901 TGGGGGATTT TGGGCATTGT CTGGATGCTT GCCGTTTTGG TGCTGCTCGC 

951 CTTTAATCCG CAGCGTTTTC AAGACAACCT CGTCTGGCTG CTGCCGCCGC 

1001 TTGCCCTGTT CGGCGCGGCG CAACTGGACA GCCTGAGGCG CGGCGCGGCG 

1051 GCTTTTGTCA ACTGGTTCGG CATTATGGCG TTCGGGCTGT TTGCCGTGTT 

1101 CCTGTGGACG GGCTTTTTCG CCATGAATTA CGGCTGGCCC GCCAAGCTTG 

1151 CCGAACGCGC CGCCTACTTC AGCCCGTATT ACGTTCCCGA CATCGATCCC 

1201 ATTCCGATGG CGGTTGCCGT ACTGTTCACA CCCTTGTGGC TGTGGGCGAT 

1251 TACCCGGAAA AACATACGCG GCAGGCAGGC GGTTACCAAC TGGGCGGCAG 

1301 GCGTTACCCT GACCTGGGCT TTGCTGATGA CGCTGTTCCT GCCGTGGCTG 

1351 GACGCGGCGA AAAGCCACGC GCCCGTCGTC CGGAGTATGG AGGCATCGTT 

1401 TTCCCCGGAA TTAAAACGGG AGCTTTCAGA CGGCATCGAG TGTATCGGCA 

1451 TAGGCGGCGG CGACCTGCAC ACGCGGATTG TTTGGACGCA GTACGGCACA 

1501 TTGCCGCACC GCGTCGGCGA TGTCCGTTGC CGCTACCGTA TCGTCCGCCT 

1551 GCCCCAAAAC GCGGATGCGC CGCAAGGCTG GCAGACGGTC TGGCAGGGTG 

1601 CGCGCCCGCG CAACAAAGAC AGTAAGTTTG CACTGATACG GAAAATCGGG 

1651 GAAAATATAT TAAAAACAAC AGATTGA 

This corresponds to the amino acid sequence <SEQ ID 602; ORF141ng-l>: 



1 MLTYTPPDAR PPAKTHEKPW LLLLMAFAWL WPGVFS HDLW NPAEPAVYTA 

51 VEALAGSPTP LVAHLFGQTD FGIPPVYLWV AAAFKHLLSP WAADPYDAAR 

101 FAGVFFAVIG LTS CGFA G FN FLGRHHGRS V VLIHIGCIGL IPVAHF LNPA 

151 AAAFAAAGLV LHGYSLARRR VIAASFLLGT GWTLMSLA AA YPAAFALMLP 

201 LPVLMFF RPW QSRR LMLTAV ASLAFALPLM TV YPLLLAKT QPALFAQWLN 

251 YHVFGTFGGV RHIQRAFSLF HYLKNLLWFA PPGLPLAVWT VCRTRLFSTD 

301 WGILGIVWML AVLVLLAFNP QRFQDNLVWL LPPLALFGAA QLDSLRRGAA 
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351 AFVNWFG IMA FGLFAVFLWT GFFA MNYGWP AKLAERAAYF SPYYVPDIDP 
401 I PMAVAVLFT PLWLWAI TRK NIRGRQAVTN WAAGVTLTWA LLMTLFL PWL 
451 DAAKSHAPW RSMEASFSPE LKRELSDGIE CIGIGGGDLH TRIVWTQYGT 
501 LPHRVGDVRC RYRIVRLPQN ADAPQGWQTV WQGARPRNKD SKFALIRKIG 
551 ENILKTTD* 

ORF141ng-l and ORF141-1 show 97.5% identity in 553 aa overlap: 

orf 141ng-l . pep MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPAEPAVYTAVEALAGSPTP 
I I I II I I I I I I II I I I I I I I I I I I I I I i I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I 
orf 141-1 MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPDEPAVYTAVEALAGSPTP 

orf 141ng-l . pep LVAHLFGQT D FG I P P VYLWVAAAFKHL L S PWAADP Y DAARFAG VFFAV I GLT S CG FAG FN 
I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I 
orf 141-1 LVAH L FGQT DFG I P PV YLWVAAAFKHLL S PWAADS Y DAAR FAGVFFAV I G LT S CG FAG FN 

orf!41ng-l.pep FLGRHHGRSWLIHIGCIGLIPVAHFLNPAAAAFAAAGLVLHGYSLARRRVIAASFLLGT 
I I I I I I I I I I I I I I I I I I I 1 I I I I I I II I I I I I I II I I I II I I I I I II I I 1 I I I I I I I I 
orf 14 1-1 FLGRHHGRSWLILIGCIGLIPVAHFLNPAAAAFAAAGLVLHGYSLARRRVIAASFLLGT 

orfl41ng-l.pep GWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTAVASLAFALPLMTVYPLLLAKT 
I i I I II I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 141-1 GWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTAVASLAFALPLMTVYPLLLAKT 

orf 14 lng-1 . pep QPALFAQWLNYHVFGTFGGVRHIQRAFSLFHYLKNLLWFAPPGLPLAVWTVCRTRLFSTD 
I I I I I III I : I I I I I I I I I I I I : I I I I I i : I I I I I II I I I : I I I I I II I I II I I I I I I 
orf 14 1-1 QPALFAQWLDYHVFGTFGGVRHVQTAFSLFYYLKNLLWFALPALPLAVWTVCRTRLFSTD 

orf 14 lng-1. pep WGILGIVWMLAVLVLLAFNPQRFQDNLVWLLPPLALFGAAQLDSLRRGAAAFVNWFGIMA 
I I I I I : I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I 1 I I I I I II I I I I I I 
orf 14 1-1 WGILGVWMLAVLVLLAVNPQRFQDNLVWLLPPLALFGAAQLDSLRRGAAAFVNWFGIMA 

orf 141ng-l .pep FGLFAVFLWTGFFAMNYGWPAKLAERAAYFSPYYVPDI DPI PMAVAVLFT PLWLWAITRK 

I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I 
orf 141-1 FGLFAVFLWTGFFAMNYGWPAKLAERAAYFS PYYVPDI DPI PMAVAVLFT PLWLWAITRK 

orfl41ng-l.pep NIRGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKSHAPWRSMEASFSPELKRELSDGIE 

II II III I II INI I I II llll III M I I II M II II II IIMMI:I I II I llll IIM 
orf 141-1 NIRGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKSHAPWRSMEASLSPELKRELSDGIE 

orf 14 lng-1 .pep CIGIGGGDLHTRIVWTQYGTLPHRVGDVRCRYRIVRLPQNADAPQGWQTVWQGARPRNKD 
I I II II I I I I I I I I I I I I I I I I I I I II I : I I I I I I I I I I I II I II I II I I I I I I I I I I I 
orf 141-1 CIGIGGGDLHTRIVWTQYGTLPHRVGDVQCRYRIVLLPQNADAPQGWQTVWQGARPRNKD 

orf 14 lng-1 .pep SKFALIRKIGENILKTTDX 

I I I 1 I I I I I I I I I 
orf 141-1 SKFALIRKIGENIX 

Based on the presence of several putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and Kgonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 72 



The following partial DNA sequence was identified in N.meningitidis <SEQ ID 603>: 

1 . . CAATCCGCCA AATGGTTATC GGGCCAAACT CTAGTCGGCA CAGCAATTGG 

51 GATACGCGGG CAGATAAAGC TTGGCGGCAA CCTGCATTAC GATATATTTA 

101 CCGGCCGCGC ATTGAAAAAG CCCGAATTTT TCCAATCAAG GAAATGGGCA 

151 AGCGGTTTTC AGGTAGGCTA TACGTTTTAA 

This corresponds to the amino acid sequence <SEQ ID 604; ORF142>: 

1 . . QSAKWLSGQT LVGTAIGIRG QIKLGGNLHY DIFTGRALKK PEFFQSRKWA 
51 SG FQVGYTF* 

Further work revealed the complete nucleotide sequence <SEQ ID 605>: 
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1 ATGGATAATT CGGGTAGTGA GGCGACAGGA AAATACCAAG GAAATATCAC 

51 TTTCTCTGCC GACAATCCTT TGGGACTGAG TGATATGTTC TATGTAAATT 

101 ATGGACGTTC GATTGGCGGT ACGCCCGATG AGGAAAGTTT TGACGGCCAT 

151 CGCAAAGAAG GCGGATCAAA CAATTACGCC GTACATTATT CAGCCCCTTT 

201 CGGTAAATGG ACATGGGCAT TCAATCACAA TGGCTACCGT TACCATCAGG 

251 CAGTTTCCGG ATTATCGGAA GTCTATGACT ATAATGGAAA AAGTTACAAT 

301 ACTGATTTCG GCTTCAACCG CCTGTTGTAT CGTGATGCCA AACGCAAAAC 

351 CTATCTCGGT GTAAAACTGT GGATGAGGGA AACAAAAAGT TACATTGATG 

401 ATGCCGAACT GACTGTACAA CGGCGTAAAA CTGCGGGTTG GTTGGCAGAA 

451 CTTTCCCACA AAGAATATAT CGGTCGCAGT ACGGCAGATT TTAAGTTGAA 

501 ATATAAACGC GGCACCGGCA TGAAAGATGC TCTGCGCGCG CCTGAAGAAG 

551 CCTTTGGCGA AGGCACGTCA CGTATGAAAA TTTGGACGGC ATCGGCTGAT 

601 GTAAATACTC CTTTTCAAAT CGGTAAACAG CTATTTGCCT AT G AC AC AT C 

651 CGTTCATGCA CAATGGAACA AAACCCCGCT AACATCGCAA GACAAACTGG 

701 CTATCGGCGG ACACCACACC GTACGTGGCT TCGACGGTGA AATGAGTTTG 

751 TCTGCCGAGC GGGGATGGTA TTGGCGCAAC GATTTGAGCT GGCAATTTAA 

801 ACCAGGCCAT CAGCTTTATC TTGGGGCTGA TGTAGGACAT GTTTCAGGAC 

851 AATCCGCCAA ATGGTTATCG GGCCAAACTC TAGTCGGCAC AGCAATTGGG 

901 ATACGCGGGC AGATAAAGCT TGGCGGCAAC CTGCATTACG ATATATTTAC 

951 CGGCCGCGCA TTGAAAAAGC CCGAATTTTT CCAATCAAGG AAATGGGCAA 

1001 GCGGTTTTCA GGTAGGCTAT ACGTTTTAA 

This corresponds to the amino acid sequence <SEQ ID 606; ORF142-l>: 



1 MDNSGSEATG KYQGNITFSA DNPLGLSDMF YVNYGRSIGG TPDEESFDGH 

51 RKEGGSNNYA VHYSAPFGKW TWAFNHNGYR YHQAVSGLSE VYDYNGKSYN 

101 TDFGFNRLLY RDAKRKTYLG VKLWMRETKS YIDDAELTVQ RRKTAGWLAE 

151 LSHKEYIGRS TADFKLKYKR GTGMKDALRA PEEAFGEGTS RMKIWTASAD 

201 VNTPFQIGKQ LFAYDTSVHA QWNKTPLTSQ DKLAIGGHHT VRGFDGEMSL 

251 SAERGWYWRN DLSWQFKPGH QLYLGADVGH VSGQSAKWLS GQTLVGTAIG 

301 IRGQIKLGGN LHYDIFTGRA LKKPEFFQSR KWASGFQVG Y TF * 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from TV Gonorrhoeae 

ORF142 shows 88.1% identity over a 59aa overlap with a predicted ORF (ORF142ng) from 
N.gonorrhoeae: 

orf 142 .pep QSAKWLSGQTLVGTAIGIRGQIKLGGNLHY 30 

I ! I I I I I I I I I : I I I I I I I I I I I I I I I I I I 
orfl42ng RGW YWRN DLS WQ FK PGHQL YLGADVGH VS GQS AKW L S GQT LAGT AI G I RGQI KLGGN LHY 313 



orf 142 .pep DIFTGRALKKPEFFQSRKWASGFQVGYTF 59 

I I I I I I t I I I I I: I I :: I I I I I II 1:1 

orfl42ng DIFTGRALKKPEYFQTKKWVTGFQVGYSF 342 

The complete length ORF142ng nucleotide sequence <SEQ ID 607> is: 

1 ATGGATAATT CGGGTAGTGA GGCGACAGGA AAATACCAAG GAAATATCAC 

51 TTTCTCTGCC GACAATCCTT TTGGACTGAG TGATATGTTC TATGTAAATT 

101 ATGGACGTTC AATTGGCGGT ACGCCCGATG AGGAAAATTT TGACGGCCAT 

151 CGCAAAGAAG GCGGATCAAA CAATTACGCC GTACATTATT CAGCCCCTTT 

201 CGGTAAATGG ACATGGGCAT TCAATCACAA TGGCTACCGT TACCATCAGG 

251 CGGTTTCCGG ATTATCGGAA GTCTATGACT ATAATGGAAA AAGTTACAAC 

301 ACTGATTTCG GCTTCAACCG CCTGTTGTAT CGTGATGCCA AACGCAAAAC 

351 CTATCTCAGT GTAAAACTGT GGACGAGGGA AACAAAAAGT TACATTGATG 

401 ATGCCGAACT GACTGTACAA CGGCGTAAAA CCACAGGTTG GTTGGCAGAA 

451 CTTTCCCACA AAGGATATAT CGGTCGCAGT ACGGCAGATT TTAAGTTGAA 

501 ATATAAACAC GGCACCGGCA TGAAAGATGC TCTGCGCGCG CCTGAAGAAG 

551 CCTTTGGCGA AGGCACGTCA CGTATGAAAA TTTGGACGGC ATCGGCTGAT 

601 GTAAATACTC CTTTTCAAAT CGGTAAACAG CTATTTGCCT ATGACACATC 

651 CGTTCATGCA CAATGGAACA AAACCCCGCT AACATCGCAA GACAAACTGG 

701 CTATCGGCGG ACACCACACC GTACGTGGCT TCGACGGTGA AATGAGTTTG 

751 CCTGCCGAGC GGGGATGGTA TTGGCGCAAC GATTTGAGCT GGCAATTTAA 

801 ACCAGGCCAT CAGCTTTATC TTGGGGCTGA TGTAGGACAT GTTTCAGGAC 

851 AATCCGCCAA ATGGTTATCG GGCCAAACTC TAGCCGGCAC AGCAATTGGG 

901 ATACGCGGGC AGATAAAGCT TGGCGGCAAC CTGCATTACG ATATATTTAC 

951 CGGCCGTGCA TTGAAAAAGC CCGAATATTT TCAGACGAAG AAATGGGTAA 
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1001 CGGGGTTTCA GGTGGGTTAT TCGTTTTGA 

This encodes a protein having amino acid sequence <SEQ ID 608>: 

1 MDNSGSEATG KYQGNITFSA DNPFGLSDMF YVNYGRSIGG TPDEENFDGH 

51 RKEGGSNNYA VHYSAPFGKW TWAFNHNGYR YHQAVSGLSE VYDYNGKSYN 

5 101 TDFGFNRLLY RDAKRKTYLS VKLWTRETKS YIDDAELTVQ RRKTTGWLAE 

151 LSHKGYIGRS TADFKLKYKH GTGMKDALRA PEEAFGEGTS RMKIWTASAD 

201 VNTPFQIGKQ LFAYDTSVHA QWNKTPLTSQ DKLAIGGHHT VRGFDGEMSL 

251 PAERGWYWRN DLSWQFKPGH QLYLGADVGH VSGQSAKWLS GQTLAGTAIG 

301 IRGQIKLGGN LHYDIFTGRA LKKPEYFQTK KWVTGFQVG Y SF * 

10 The underlined sequence (aromatic-Xaa-aromatic amino acid motif) is usually found at the 
C-terminal end of outer membrane proteins. 

ORF142ng and ORF142-1 show 95.6% identity over 342aa overlap: 

orf 142-1. pep MDNSGSEATGKYQGNITFSADNPLGLSDMFYVNYGRSIGGTPDEESFDGHRKEGGSNNYA 
IMIMI I M I I I I I II ! Mil 1:1 I IIMM I MMMI It !!l:l! Mill II II I I I 
15 orfl42ng-l MDNSGSEATGKYQGNITFSADNPFGLSDMFYVNYGRSIGGTPDEENFDGHRKEGGSNNYA 

orf 142-1 . pep VHYSAPFGKWTWAFNHNGYRYHQAVSGLSEVYDYNGKSYNTDFGFNRLLYRDAKRKTYLG 
I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : 
orf!42ng-l VHYSAPFGKWTWAFNHNGYRYHQAVSGLSEVYDYNGKSYNTDFGFNRLLYRDAKRKTYLS 



20 



orf 142-1. pep VKLWMRETKSYIDDAELTVQRRKTAGWLAELSHKEYIGRSTADFKLKYKRGTGMKDALRA 
I I I I I I I I II I I I I I I I I I I I I I : I II I I I I I I I I I I I I I I I I I I I I: I I I I I I I I I I 
or f 1 4 2ng- 1 VKLWTRETKS Y I DDAELTVQRRKTTGWLAELSHKGYIGRSTADFKLKYKHGTGMKDALRA 



25 orf 142-1 .pep PEEAFGEGTSRMK I WTASADVNTPFQIGKQL FAY DTSVHAQWNKTPLTSQ DKLAIGGHHT 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I 
orfl42ng-l PEEAFGEGTSRMKIWTASADVNTPFQIGKQLFAYDTSVHAQWNKTPLTSQDKLAIGGHHT 

orf 142-1 . pep VRGFDGEMSLSAERGWYWRNDLSWQFKPGHQLYLGADVGHVSGQSAKWLSGQTLVGTAIG 
30 I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I : I I I I I 

orfl42ng-l VRGFDGEMSLPAERGWYWRNDLSWQFKPGHQLYLGADVGHVSGQSAKWLSGQTLAGTAIG 

orf 142-1 . pep IRGQIKLGGNLHYDIFTGRALKKPEFFQSRKWASGFQVGYTF 
Mil IIMM I I I II Mill III ll:M::||::l llll 1:1 
35 orfl42ng-l IRGQIKLGGNLHYDIFTGRALKKPEYFQTKKWVTGFQVGYSF 

In addition, ORF142ng is homologous to the HecB protein of E.chrysanthemi: 

gi 1 1772622 (L39897) HecB [Erwinia chrysanthemi] Length = 558 
Score = 119 bits (295), Expect - 3e-26 

Identities « 88/346 (25%), Positives = 151/346 (43%), Gaps = 22/346 (6%) 



40 



Query: 2 DNSGSEATGKYQGNITFSADNPFGLSDMFYVNYGRSIGGTPDEENFDGHRKEGGSNNYAV 61 

DNSG ++TG+ Q N + + DN FGL+D ++++ G S + + D + G 
Sbjct: 230 DNSGQKSTGEEQLNGSLALDNVFGLADQWFISAGHS SRFATSHDAESLQAG 280 



45 Query: 62 HYSAPFGKWTWAFNHNGYRYHQAVSGLSEVYDYNGKSYNTDFGFNRLLYRDAKRKTYLSV 121 

+S P+G W +N++ RY + G S F +R+++RD KT ++ 

Sbjct: 281 -FSMPYGYWNLGYNYSQSRYRNTFINRDFPWHSTGDSDTHRFSLSRWFRDGTMKTAIAG 339 

Query: 122 KLWTRETKSYIDDAELTVQRRKTTGWLAELSHKGYIGRSTADFKLKYKHGTGMKDALRAP 181 
50 R +Y++ + L RK + ++H + A F Y G + 

Sbjct: 340 TFSQRTGNNYLNGSLLPSSSRKLSSVSLGVNHSQKLWGGLATFNPTYNRGVRWLGSETDT 399 

Query: 182 EEAFGEGTSRMKIWTASADVNTPFQIGKQLFAYDTSVHAQWNKTPLTSQDKLAIGGHHTV 241 
+++ E + WT SA P Y S++ Q++ L ++L +GG ++ 

55 Sbjct: 400 DKSADEPRAEFNKWTLSASYYHPV TDSITYLGSLYGQYSARALYGSEQLTLGGESSI 456 

Query: 242 RGFDGEMSLPAERGWYWRNDLSWQFKP GHQLYLGA- DVGHVSGQSAKWLSGQTLAG 296 

RGF E RG YWRN+L+WQ G+ ++ A D GH+ + +L G 

Sbjct: 457 RGF-REQYTSGNRGAYWRNELNWQAWQLPVLGNVTFMAAVDGGHLYNHKQDNSTAASLWG 515 



60 



Query: 297 TAIGIRGQIKLGGNLHYDIFTGRALKKPEYFQTKKWVTGFQVGYSF 342 
A+G+ + L+G+P+Q V G++VG SF 
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" Sbjct: 516 GAVGMTVASRW L S QQVT VGW P I S Y PAWLQ P DTMWG YRVGL S F 558 

On the basis of this analysis, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example 73 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 609>: 

1 ATGCGGACGA AATGGTCAGC AGTGAGAAGC TGCTJACTTG GgCGGACACC 

51 GCCGACATCG ATACCGCTTT GAACCTGTTG TACCGTTTGC AAAAACTCGA 

101 ATTCCTCTAT GGCGATGAAA ACGGTCATTC AGACGGCATC AATTTGwCGG 

151 ACGAGCAATT GCCGTTGCTG ATGGAACAAT TGTCCGGCAG CGGTAAGGCG 

201 TTATTGGTCG ATCGGAACGG TCTGTATCTT GCCAACGCCA ATTTCCATCA 

251 TGAGGCGGCG GAAGAGTTGG GGTTGTTGGC GGCAGAAGTC GCACAGATGG 

301 AAAAGAAATA CCGGCTGCTG ATTAAGAACA AC. . 

This corresponds to the amino acid sequence <SEQ ID 610; ORF143>: 

1 MRTKWSAVRS C2WADTADID TALNLLYRLQ KLEFLYGDEN GHSDGINLXD 
51 EQLPLLMEQL SGSGKALLVD RNGLYLANAN FHHEAAEELG LLAAEVAQME 
101 KKYRLLIKNN . . 

Further work revealed the complete nucleotide sequence <SEQ ID 61 1>: 

1 ATGGAATCAA CACTTTCACT ACAAGCAAAT TTATATCCCC GCCTGACTCC 

51 TGCCGGTGCA TTTTATGCCG TATCCAGCGA TGCCCCCAGT GCCGGTAAAA 

101 CTTTGTTGCA CAGCCTGTTG AAAGCAGATG CGGACGAAAT GGTCAGCAGT 

151 GAGAAGCTGC TTACTTGGGC GGACACCGCC GACATCGATA CCGCTTTGAA 

201 CCTGTTGTAC CGTTTGCAAA AACTCGAATT CCTCTATGGC GATGAAAACG 

251 GTCATTCAGA CGGCATCAAT TTGTCGGACG AGCAATTGCC GTTGCTGATG 

301 GAACAATTGT CCGGCAGCGG TAAGGCGTTA TTGGTCGATC GGAACGGTCT 

351 GTATCTTGCC AACGCCAATT TCCATCATGA GGCGGCGGAA GAGTTGGGGT 

401 TGTTGGCGGC AGAAGTCGCA CAGATGGAAA AGAAATACCG GCTGCTGATT 

451 AAGAACAACC TGTATATCAA CAATAACGCT TGGGGCGTTT GCGATCCTTC 

501 CGGTCAGAGC GAATTGACAT TTTTCCCATT GTATATCGGT TCAACCAAAT 

551 TTATTTTGGT TATCGGCGGC ATTCCCGATT TGGGCAAAGA GGCATTTGTT 

601 ACTTTGGTAA GGATTTTATA CCGCCGTTAC AGCAACCGCG TGTAA 

This corresponds to the amino acid sequence <SEQ ID 612; ORF143-l>: 

1 MESTLSLQAN LYPRLTPAGA FYAVSSDAPS AGKTLLHSLL KADADEMVSS 

51 EKLL TWAUTA DIDTALNLLY RLQKLEFLYG DENGHSDGIN LSDEQLPLLM 

101 EQLSGSGKAL LVDRNGLYLA NANFHHEAAE ELGLLAAEVA QMEKKYRLLI 

151 KNNLYINNNA WGVCDPSGQS ELT FFPLYIG STKFILVIGG IPDLGKEAFV 

201 TLVRILYRRY SNRV* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF143 shows 92.4% identity over a 105aa overlap with an ORF (ORF143a) from strain A ofN. 
meningitidis: 

10 20 30 

or J143 . pep MRTKW S AVRS CTWADT AD I DT ALN LLYRLQKLE FL 

I : : I I I I I II I I I I I I I I I I I I I I I I 
orfl43a GAFYAVSSDXPSAGKTLLHSLLKADADEMVSSEKLLTWAXTADIDTALNLLYRLQKLEFL 
20 30 40 50 60 70 

40 50 60 70 80 90 

orfl43.pep YGDENGHSDGINLXDEQLPLLMEQLSGSGKALLVDRNGLYLANANFHHEAAEELGLLAAE 

1 1 1 1 1 1 1 1 1 1 1 1 1 i iii 1 1 ii i iii iii 1 1 iii in mm i inn ii mi iiiiii i 
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orf 143a YGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLANANFHHEAAEELGLLAAE 
80 90 100 110 120 130 

100 110 
orf 143 .pep VAQMEKKYRLLIKNN 
I I I I I I I I I I MM 

orfl4 3a VAQMEKKYRLXIKNNLYINNNAWGVCDPSGQSELT FFPLYIGSTKFILVIGG IPDLGKEA 
140 150 160 170 180 190 

The complete length ORF143a nucleotide sequence <SEQ ID 613> is: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



ATGGAATCAA 
TGCCGGTGCA 
CTTTGTTGCA 
GAGAAGCTGC 
CCTGTTGTAC 
GTCATTCAGA 
GAACAATTGT 
GTATCTTGCC 
TGTTGGCGGC 
AAGAACAACC 
CGGTCAGAGC 
TTATTTTGGT 
ACTTTGGTAA 
TGGGAGAGAG 



CANTTTCACT 
TTTTATGCCG 
CAGCCTGTTG 
TTACCTGGGC 
CGTTTGCAAA 
CGGCATCAAT 
CCGGCAGCGG 
AACGCCAATT 
AGAAGTCGCA 
TGTATATCAA 
GAATTGACAT 
TATCGGCGGC 
GGATNTTATA 
GANGGGTTAT 



ACAAGCAAAT 
TATCCAGCGA 
AAAGCGGATG 
GGANACCGCC 
AACTCGAATT 
TTGTCGGACG 
TAAGGCGTTA 
TCCATCATGA 
CAGATGGAAA 
CAATAACGCT 
TTTTCCCATT 
ATTCCCGATT 
CCNCCNGTTA 
GCAGCAATTA 



TTATATCNCC 
TGNCCCCAGT 
CGGACGAAAT 
GACATCGATA 
CCTCTATGGC 
AGCAATTGCC 
TTGGTCGATC 
GGCGGCGGAA 
AGAAATACCG 
TGGGGCGTTT 
GTATATCGGT 
TGGGCAAAGA 
CAGCAACCGC 
TTGA 



GCCTGACTCC 
GCCGGTAAAA 
GGTNAGCAGT 
CCGCTTTGAA 
GATGAAAACG 
GTTGCTGATG 
GGAACGGTCT 
GAGTTGGGGT 
GCTGCNNATT 
GCGATCCTTC 
TCAACCAAAT 
GGCATTTGTT 
GTGTAAAACT 



This encodes a protein having amino acid sequence <SEQ ID 614>: 

1 MESTXSLQAN LYXRLTPAGA FYAVSSDXPS AGKTLLHSLL KADADEMVSS 

51 EKLLTWAXTA DIDTALNLLY RLQKLEFLYG DENGHSDGIN LSDEQLPLLM 

101 EQLSGSGKAL LVDRNGLYLA NANFHHEAAE ELGLLAAEVA QMEKKYRLXI 

151 KNNLYINNNA WGVCDPSGQS ELT FFPLYIG STKFILVIGG IPDLGKEAFV 

201 TLVRXLYXXL QQPRVKLGRE XGLCSNY* 

ORF143a and ORF143-1 show 97.1% identity in 207 aa overlap: 

orf 143a . pep ME S TXS LQANLYXRLTPAGAFYAVS S DXPS AGKT LLHSLLKADADEMVS SEKLLTWAXTA 
Mi! I I I I I I I I I I I I I I I I I II I I I I I I I II II I I I I I I I I I I I I I I I I I I I I II 
orf 143-1 ME S T LS LQAN L YPRLT PAGAFYAV S S DAP S AGKT LLH SL LKADADEMVS S EKLLTWADT A 

orf 143a. pep DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 
I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I 
orf 143-1 DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 

orf 143a . pep NANFHHEAAEELGLLAAEVAQMEKKYRLXIKNNLYINNNAWGVCDPSGQSELTFFPLYIG 
I I I I I I I 1 I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 143-1 NANFHHEAAEELGLLAAEVAQMEKKYRLLIKNNLYINNNAWGVCDPSGQSELTFFPLYIG 

orf 143a. pep STKFILVIGG I PDLGKEAFVTLVRXLY 
I I I I I I I II II I I I I I I I II I I I I II 
orf 143-1 STKFILVIGG I PDLGKEAFVTLVRILY 

Homology with a predicted ORF from N. gonorrhoeae 

ORF143 shows 95.5% identity over a llOaa overlap with a predicted ORF (ORF143ng) from 
N. gonorrhoeae: 



orf 143. pep 
orf 143ng 
orf 143. pep 
orf 143ng 



MRTKWSAVRSCTWADTADI DTALNLLYRLQKLE FLYGDENGHS DGINLXDEQLPLLMEQL 60 
I I I I I I I I I I I : I I I I I I I It I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

MRT KW S AVR S C S RADT AD I DT ALN LL YRLQKLE FLYG DENGH S DG I N L S DEQL PLLMEQL 60 

SGSGKALLVDRNGLYLANANFHHEAAEELGLLAAEVAQMEKKYRLLIKNN 110 
I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I : I I 

SGSGKALLVDRNGLYLANANFHHESAEELGLLAAEVAQMEKKYRLLIRNNLYINNNAWGV 120 



An ORF143ng nucleotide sequence <SEQ ID 615> was predicted to encode a protein having amino 



acid sequence <SEQ ID 61 6>: 
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1 MRTKWSAVRS CSRADTADID TALNLLYRLQ KLEFLYGDEN GHSDGINLSD 

51 EQLPLLMEQL SGSGKALLVD RNGLYLANAN FHHESAEELG LLAAEVAQME 

101 KKYRLLIRNN LYINNNAWGV CDPSGQSELT FFPLYIGSTK FILVIAGI PD 

151 LSKGGICYFG KDFIPPLQQP RVKLGTGGIM RQLLISILED LNNTSTDIIA 

201 SAVISTDGLP MATMLPSHLN SDRVGAISAT LLALGSRSVQ ELACGELEQV 

251 MIKGKSGYIL LSQAGKDAVL VLVAKETG RL GLILLDAKRA ARHIA EAI* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 617>: 

1 ATGGAATCAA CACTTTCACT ACAAGCGAAT TTATATCCCT GCCTGACTCC 

51 TGCCGGTGCA TTTTATGCCG TATCCAGCGA TGCCCCCAGT GCCGGTAAAA 

101 CTTTGTTGCG CAGCCTGTTG AAAGCGGATG CGGACGAAGT GGTCAGCAGT 

151 GAGAAGCTGC TCGCGGCGGA CACCGCCGAC ATCGATACCG CTTTGAACCT 

201 GTTGTACCGT TTGCAAAAAC TCGAATTCCT CTATGGCGAT GAAAACGGTC 

251 ATTCAGACGG CATCAATTTG TCGGACGAGC AATTGCCGTT GCTGATGGAA 

301 CAATTGTCCG GCAGCGGTAA GGCATTATTG GTCGATCGGA ACGGTCTGTA 

351 TCTTGCCAAC GCCAATTTCC ATCATGAGTC GGCGGAAGAG TTGGGGTTGT 

401 TGGCGGCAGA AGTCGCACAG ATGGAAAAGA AATACCGGCT GCTGATTAGG 

451 AACAACCTGT ATATCAACAA TAACGCTTGG GGCGTTTGCG ATCCTTCCGG 

501 TCAGAGCGAA TTGACATTTT TCCCATTGTA TATCGGTTCA ACCAAATTTA 

551 TTTTGGTTAT CGCCGGCATT CCCGATTTGA GCAAAGAGGC ATTTGTTACT 

601 TTGGTAAGGA TTTTATACCG CCGTTACAGC AACCGCGTGT AA 

This corresponds to the amino acid sequence <SEQ ID 618; ORF143ng-l>: 

1 MESTLSLQAN LYPCLTPAGA FYAVSSDAPS AGKTLLRSLL KADADEWSS 

51 EKLLAADTAD IDTALNLLYR LQKLEFLYGD ENGHSDGINL SDEQLPLLME 

101 QLSGSGKALL VDRNGLYLAN ANFHHESAEE LGLLAAEVAQ MEKKYRLLIR 

151 NNLYINNNAW GVCDPSGQSE LT FFPLYIGS TKFILVIAGI PDLSKEAFVT 

201 LVRILYRRYS NRV* ~ ^ "~~ 

ORF143ng-l and ORF143-1 show 95.8% identity in 214 aa overlap: 

orf 14 3ng-l .pep MESTLSLQANLYPCLTPAGAFYAVSSDAPSAGKTLLRSLLKADADEWSSEKLLA-ADTA 59 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I : I I I I I I I I I : I I I I I I I : I I I I 
orf 143-1 MESTLSLQANLYPRLTPAGAFYAVSSDAPSAGKTLLHSLLKADADEMVSSEKLLTWADTA 60 

orf 143ng-l .pep DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 119 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 143-1 DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 120 

orf 143ng-l . pep NANFHHESAEELGLLAAEVAQMEKKYRLLIRNNLYINNNAWGVCDPSGQSELTFFPLYIG 17 9 

I I I I I I I: I I I I I I I I I I I I I I I I I I I I I I: II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 143-1 NANFHHEAAEELGLLAAEVAQMEKKYRLLIKNNLYINNNAWGVCDPSGQSELTFFPLYIG 180 

orf 143ng-l .pep STKFILVIAGIPDLSKEAFVTLVRILYRRYSNRV 213 

I I I II I I I : I 1 I I I : I I I I I I II I I I I I I I II II 
orf 14 3-1 STKFILVIGGIPDLGKEAFVTLVRILYRRYSNRV 214 

Based on the presence of the putative transmembrane domains" in the gonococcal protein, it is 
predicted that the proteins from N.meningitidis and N .gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 74 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 61 9>: 

1 ATGACCTTTT TACAACGTTT GCAAGGTTTG GCAGACAATA AAATCTGTGC 

51 GTTTGCATGG TTCGTCGTCC GCCGCTTTGA TGAAGAACGC GTACCGCAGr 

101 CGGCGGCAAG CATGACGTTT ACGACGCTGC TGGCACTCGT CCCCGTGCTG 

151 ACCGTGATGG TGGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGCTGGTC 

201 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CA.GGCGCGG 

251 ACATGGTGTT CGACTATATC AATGCGTTCC GCGAGCAGGC GAACCGGCTG 

301 ACGGCAATCG GCAGCGTGAT GCTGGTCGTT ACCTCGCTGA TGCTGATTCG 

351 GACGATAGAC AATACGTTCA ACCGCATCTG GaCGGGTCAA wTyCCAGCGT 

401 CCGTGGATG. . 
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This corresponds to the amino acid sequence <SEQ ID 620; ORF144>: 

1 MTFLQRLQGL ADNKICAFAW FWRRFDEER VPQXAASMTF TTLLALVPVL 
51 TVMVAVASIF PVFDRWSDSF VSFVNQTIVP XGADMVFDYI NAFREQANRL 
101 TAIGSVMLW TSLMLIRTID NTFNRIWRVX XQRPWM. . . 

Further work revealed the complete nucleotide sequence <SEQ ED 62 1>: 



1 ATGACCTTTT TACAACGTTT GCAAGGTTTG GCAGACAATA AAATCTGTGC 

51 GTTTGCATGG TTCGTCGTCC GCCGCTTTGA TGAAGAACGC GTACCGCAGG 

101 CGGCGGCAAG CATGACGTTT ACGACGCTGC TGGCACTCGT CCCCGTGCTG 

151 ACCGTGATGG TGGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGCTGGTC 

201 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CAGGGCGCGG 

251 ACATGGTGTT CGACTATATC AATGCGTTCC GCGAGCAGGC GAACCGGCTG 

301 ACGGCAATCG GCAGCGTGAT GCTGGTCGTT ACCTCGCTGA TGCTGATTCG 

351 GACGATAGAC AATACGTTCA ACCGCATCTG GCGGGTCAAT TCCCAGCGTC 

401 CGTGGATGAT GCAGTTTCTC GTCTATTGGG CTTTACTGAC GTTCGGGCCG 

451 CTGTCTTTGG GCGTGGGCAT TTCCTTTATG GTCGGCTCGG TACAGGATGC 

501 CGCGCTTGCC TCAGGTGCGC CGCAGTGGTC GGGCGCGTTG CGAACGGCGG 

551 CGACGCTGAC CTTCATGACG CTTTTGCTGT GGGGGCTGTA CCGCTTCGTG 

601 CCAAACCGCT TCGTTCCCGC GCGGCAGGCG TTTGTCGGGG CTTTGGCAAC 

651 AGCGTTTTGT CTGGAAACCG CGCGCTCCCT CTTCACTTGG TATATGGGCA 

701 ATTTCGACGG CTACCGCTCG ATTTACGGCG CGTTTGCCGC CGTGCCGTTT 

751 TTTCTGTTGT GGCTGAACCT GTTGTGGACG CTGGTCTTGG GCGGCGCGGT 

801 GCTGACTTCT TCACTCTCCT ACTGGCAGGG AGAAGCGTTC CGCAGGGGCT 

851 TCGACTCGCG CGGACGGTTT GACGACGTGT TGAAAATCCT GCTGCTTCTG 

901 GATGCGGCGC AAAAAGAAGG CAAAGCCTTG CCTGTTCAGG AGTTCAGACG 

951 GCATATCAAT ATGGGCTACG ACGAGTTGGG CGAGCTTTTG GAAAAGCTGG 

1001 CGCGGCACGG CTACATCTAT TCCGGCAGAC AGGGTTGGGT GTTGAAAACG 

1051 GGGGCGGATT CGATTGAGTT GAACGAACTC TTCAAGCTCT TCGTTTACCG 

1101 TCCGTTGCCT GTGGAAAGGG ATCATGTGAA CCAAGCTGTC GATGCGGTAA 

1151 TGACACCGTG TTTGCAGACT TTGAACATGA CGCTGGCAGA GTTTGACGCT 

1201 CAGGCGAAAA AACGGCAGTA G 

This corresponds to the amino acid sequence <SEQ ID 622; ORF144-l>: 

1 MTFLQRLQGL ADNKICAFA W FWRRFDEER VPQAAASMTF TT LLALVPVL 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI NAFREQANRL 

101 TAIGSVMLW TSLMLI RTID NTFNRIWRVN SQRPWMMQFL VYW ALLTFGP 

151 LSLGVGISFM V GSVQDAALA SGAPQWSGAL RTAATLTFMT LLLWGLYRFV 

201 PNRFVPARQA FVGALATAFC LETARSLFTW YMGNFDGYRS IYGAF AAVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKILLLL 

301 DAAQKEGKAL PVQEFRRHIN MGYDELGELL EKLARHGYIY SGRQGWVLKT 

351 GADSIELNEL FKLFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAEFDA 

401 QAKKRQ* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N.meninzitidis (strain A) 

ORF144 shows 96.3% identity over a 136aa overlap with an ORF (ORF144a) from strain A of AT. 
meningitidis: 

10 20 30 40 50 60 

or f 1 4 4 . pep MTFLQRLQGLADNKICAFA WFWRRFDEERVPQXAASMTFTT LLALVPVLTVMVAVASI F 
I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl4 4a MTFLQRLQGLADNKICAFA WFWRRFDEERVPQAAASMTFTT LLALVPVLTVMVAVASI F 

10 20 30 40 50 60 

70 80 90 100 110 120 

orfl44 .pep PVFDRWSDSFVSFVNQTIVPXGADMVFDYINAFREQAN RLTAIGSVMLWTSLML IRTID 
I I I [ I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I II I I I I I I I I I I 
orfl4 4a PVFDRWSDSFVSFVNQTIVPQGADMVFDYINAFREQAN RLTAIGSVMLWTSXML IRTID 

70 80 90 100 110 120 



130 

orf 14 4 .pep NTFNRIWRVXXQRPWM 
I I I I I II I I I I I II 

orf 144a NTFNRIWRVNSQRPWMMQFLVYWALLTFGPLSLGVGISFXVGSVQDAALASGAPQWSGAL 
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This corresponds to the amino acid sequence <SEQ ID 620; ORF144>: 

1 MTFLQRLQGL ADNKICAFAW FWRRFDEER VPQXAASMTF TTLLALVPVL 
51 TVMVAVASIF PVFDRWSDSF VSFVNQTIVP XGADMVFDYI NAFREQANRL 
101 TAIGSVMLW TSLMLIRTID NTFNRIWRVX XQRPWM. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 62 1>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 



ATGACCTTTT 
GTTTGCATGG 
CGGCGGCAAG 
ACCGTGATGG 
GGATTCGTTC 
ACATGGTGTT 
ACGGCAATCG 
GACGATAGAC 
CGTGGATGAT 
CTGTCTTTGG 
CGCGCTTGCC 
CGACGCTGAC 
CCAAACCGCT 
AGCGTTTTGT 
ATTTCGACGG 
TTTCTGTTGT 
GCTGACTTCT 
TCGACTCGCG 
GATGCGGCGC 
GCATATCAAT 
CGCGGCACGG 
GGGGCGGATT 
TCCGTTGCCT 
TGACACCGTG 
CAGGCGAAAA 



TACAACGTTT 
TTCGTCGTCC 
CATGACGTTT 
TGGCGGTCGC 
GTCTCCTTCG 
CGACTATATC 
GCAGCGTGAT 
AATACGTTCA 
GCAGTTTCTC 
GCGTGGGCAT 
TCAGGTGCGC 
CTTCATGACG 
TCGTTCCCGC 
CTGGAAACCG 
CTACCGCTCG 
GGCTGAACCT 
TCACTCTCCT 
CGGACGGTTT 
AAAAAGAAGG 
ATGGGCTACG 
CTACATCTAT 
CGATTGAGTT 
GTGGAAAGGG 
TTTGCAGACT 
AACGGCAGTA 



GCAAGGTTTG 
GCCGCTTTGA 
ACGACGCTGC 
TTCGATTTTC 
TCAACCAAAC 
AATGCGTTCC 
GCTGGTCGTT 
ACCGCATCTG 
GTCTATTGGG 
TTCCTTTATG 
CGCAGTGGTC 
CTTTTGCTGT 
GCGGCAGGCG 
CGCGCTCCCT 
ATTTACGGCG 
GTTGTGGACG 
ACTGGCAGGG 
GACGACGTGT 
CAAAGCCTTG 
ACGAGTTGGG 
TCCGGCAGAC 
GAACGAACTC 
ATCATGTGAA 
TTGAACATGA 
G 



GCAGACAATA 
TGAAGAACGC 
TGGCACTCGT 
CCCGTGTTCG 
CATTGTGCCG 
GCGAGCAGGC 
ACCTCGCTGA 
GCGGGTCAAT 
CTTTACTGAC 
GTCGGCTCGG 
GGGCGCGTTG 
GGGGGCTGTA 
TTTGTCGGGG 
CTTCACTTGG 
CGTTTGCCGC 
CTGGTCTTGG 
AGAAGCGTTC 
TGAAAATCCT 
CCTGTTCAGG 
CGAGCTTTTG 
AGGGTTGGGT 
TTCAAGCTCT 
CCAAGCTGTC 
CGCTGGCAGA 



AAATCTGTGC 
GTACCGCAGG 
CCCCGTGCTG 
ACCGCTGGTC 
CAGGGCGCGG 
GAACCGGCTG 
TGCTGATTCG 
TCCCAGCGTC 
GTTCGGGCCG 
TACAGGATGC 
CGAACGGCGG 
CCGCTTCGTG 
CTTTGGCAAC 
TATATGGGCA 
CGTGCCGTTT 
GCGGCGCGGT 
CGCAGGGGCT 
GCTGCTTCTG 
AGTTCAGACG 
GAAAAGCTGG 
GTTGAAAACG 
TCGTTTACCG 
GATGCGGTAA 
GTTTGACGCT 



This corresponds to the amino acid sequence <SEQ ID 622; ORF144-l>: 



1 MTFLQRLQGL ADNKICAFA W FWRRFDEER VPQAAASMTF TT LLALVPVL 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI NAFREQANRL 

101 TAIGSVMLW TSLMLI RTID NTFNRIWRVN SQRPWMMQFL VYWALLTFGP 

151 LSLGVGISFM V GSVQDAALA SGAPQWSGAL RTAATLTFMT LLLWGLYRFV 

201 PNRFVPARQA FVGALATAFC LETARSLFTW YMGNFDGYRS IYGAF AAVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKILLLL 

301 DAAQKEGKAL PVQEFRRHIN MGYDELGELL EKLARHGYIY SGRQGWVLKT 

351 GADSIELNEL FKLFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAEFDA 

401 QAKKRQ* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF144 shows 96.3% identity over a 136aa overlap with an ORF (ORF144a) from strain A ofN. 
meningitidis: 



10 20 30 40 50 60 

or f 1 4 4 . pep MTFLQRLQGLADNKICAFAW FWRRFDEERVPQXAASMTFTT LLALVPVLTVMVAVASI F 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
o r f 1 4 4 a MTFLQRLQGLADNKICAFA W FWRRFDEERV PQAAASMT FTT LLALVPVLTVMVAVASI F 

10 20 30 40 50 60 



70 80 90 100 110 120 

orfl4 4.pep PVFDRWSDSFVSFVNQTIVPXGADMVFDYINAFREQANR LTAI6SVMLWTSLML IRTID 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 n 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 14 4a PVFDRWSDSFVSFVNQTIVPQGADMVFDYINAFREQANR LTAIGSVMLWTSXML IRTID 

70 80 90 100 110 120 



130 

orf 14 4 .pep NTFNRIWRVXXQRPWM 
III IMM II Mi 

or f 14 4a NTFNRIWRVNSQRPWMMQFLVYWALLTFGPLSLGVGISFXVGSVQDAALASGAPQWSGAL 
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130 140 150 160 170 180 

The complete length ORF144a nucleotide sequence <SEQ ED 623> is: 

1 ATGACCTTTT TACAACGTTT GCAAGGTTTG GCAGACAATA AAATCTGTGC 

51 GTTTGCATGG TTCGTCGTCC GCCGCTTTGA TGAAGAACGC GTACCGCAGG 

101 CGGCGGCAAG CATGACGTTT ACGACACTGC TGGCACTCGT CCCCGTGCTG 

151 ACCGTGATGG TGGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGNTGGTC 

201 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CAGGGCGCGG 

251 ACATGGTNTT CGACTATATC AATGCGTTCC GCGAGCAGGC GAACCGGCTG 

301 ACGGCAATCG GCAGCGTGAT GCTGGTCGTT ACCTCGCNGA TGCTGATTCG 

351 GACGATAGAC AATACGTTCA ACCGCATCTG GCGGGTCAAT TCCCAGCGTC 

401 CGTGGATGAT GCAGTTTCTC GTCTATTGGG CTTTACTGAC GTTCGGGCCG 

451 CTGTCTTTGG GCGTGGGCAT TTCCTTTATN GTCGGCTCGG TACAGGATGC 

501 CGCGCTTGCC TCAGGTGCGC CGCAGTGGTC GGGCGCGTTG CGAACGGCGG 

551 CGACGCTGAN CTTCATGACG CTTTTGCTGT GGGGGCTGTA CCGCTNCGTG 

601 CCAAACCGCT TCGTTCCCGC GCGGCANGCG TTTGTCGGGG CTTTGGCAAC 

651 AGCGTTCTGT CTGGAAACCG CGCGTTCCCT CTTTACTTGG TATATGGGCA 

701 ATTTCGACGG CTACCGCTCG ATTTACGGNG CGTTTGCCGC CGTGCCGTTT 

751 TTTCTGTTGT GGCTGAACCT GTTGTGGACG CTGGTCTTGG GCGGCGCGGT 

801 GCTGACTTCT TCACTCTCCT ACTGGCAGGG AGAAGCGTTC CGCAGGGNCT 

851 TCGACTCGCG CGGACGGTTT GACGACGTGT TGAAAATCCT GCTGCTTCTG 

901 GATGCGGCGC AAAAAGAAGG CNAAGCCTTG CCTGTTCAGG AGTTCAGACG 

951 GCATATCAAT ATGGGCTACG ACGAGTTGGG CGAGCTTTTG GAAAAGCTGG 

1001 CGCGGCACGG CTACATCTAT TCCGGCAGAC AGGGTTGGGT GTTGAAAACG 

1051 GGGGCGGATT CGATTGAGTT GAACGAACTC TTCAAGCTCT TCGTTTACCG 

1101 TCCGTTGCCT GTGGAAAGGG ATCATGTGAA CCAAGCTGTC GATGCGGTAA 

1151 TGATGCCGTG TTTGCAGACT TTGAACATGA CGCTGGCAGA GTTTGACGCT 

1201 CAGGCGAAAA AACAGCAGCA ATCTTGA 

This encodes a protein having amino acid sequence <SEQ ID 624>: 



1 MTFLQRLQGL ADNKICAFA W FWRRFDEER VPQAAASMTF TT LLALVPVL 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI NAFREQANRL 

101 TAIGSVMLW TSXMLI RTID NTFNRIWRVN SQRPWMMQFL VYW ALLTFGP 

151 LSLGVGISFX V GSVQDAALA SGAPQWSGAL RTAATLXFMT LLLWGLYRXV 

201 PNRFVPARXA FVGALATAFC LETARSLFTW YMGNFDGYRS IYGAF AAVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRXFDSRGRF DDVLKILLLL 

301 DAAQKEGXAL PVQEFRRHIN MGYDELGELL EKLARHGYIY SGRQGWVLKT 

351 GADSIELNEL FKLFVYRPLP VERDHVNQAV DAVMMPCLQT LNMTLAEFDA 

401 QAKKQQQS * 

ORF144a and ORF144-1 show 97.8% identity in 406 aa overlap: 



or f 14 4a. pep MTFLQRLQGLADNKICAFAWFWRRFDEERVPQAAASMTFTTLLALVPVLTVMVAVASIF 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I t 
orf 144-1 MTFLQRLQGLADNKICAFAWFWRRFDEERVPQAAASMTFTTLLALVPVLTVMVAVASIF 



orfl44a.pep PVFDRWSDSFVSFVNQTIVPQGADMVFDYINAFREQANRLTAIGSVMLWTSXMLIRTID 
I I I I I i I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I 
orf 144-1 PVFDRWSDSFVSFVNQTIVPQGADMVFDYINAFREQANRLTAIGSVMLWTSLMLIRTID 

orf 14 4a. pep NTFNRIWRVNSQRPWMMQFLVYWALLTFGPLSLGVGISFXVGSVQDAALASGAPQWSGAL 
I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I M I I 
orf 14 4-1 NTFNRIWRVNSQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDAALASGAPQWSGAL 

orf 14 4a. pep RTAATLXFMTLLLWGLYRXVPNRFVPARXAFVGALATAFC LETARSLFTW YMGNFDGYRS 
I I I I I I : I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 1 I I I I 
orf 14 4-1 RTAATLTFMTLLLWGLYRFVPNRFVPARQAFVGALATAFCLETARSLFTWYMGNFDGYRS 



orf 144a. pep IYGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRXFDSRGRFDDVLKILLLL 
I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II II I II I I I I I I I I I I I I II I 
orf 144-1 IYGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRGFDSRGRFDDVLKILLLL 



orf 144a. pep DAAQKEGXALPVQEFRRHINMGYDELGELLEKLARHGYIYSGRQGWVLKTGADSIELNEL 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 144-1 DAAQKEGKALPVQEFRRHINMGYDELGELLEKLARHGYIYSGRQGWVLKTGADSIELNEL 



orfl44a.pep 
orfl44-l 



FKLFVYRPLPVERDHVNQAVDAVMMPCLQTLNMTLAEFDAQAKKQQQS 408 
II I I I I I t I i I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I : I 
FKLFVYRPLPVERDHVNQAVDAVMTPCLQTLNMTLAEFDAQAKKRQ 406 
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Homology with a predicted ORF from K gonorrhoeae 

ORF144 shows 91.2% identity over a 136aa overlap with a predicted ORF (ORF144ng) from 
N. gonorrhoeae: 

orf 14 4 . pep MTFLQRLQGI^DNKICAFAWFVVRRFDEERVPQXAASMTFTTLLALVPVLTVMVAVASIF 60 

I I I I I II I I I I I I I I I I I I : I I I : I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
orfl44ng MTFLQCWQGSADNKICAFAWFVIRRFSEERVPQAAASMTFTTLLALVPVLTVMVAVASIF 60 



orf 144. pep PVFDRWSDSFVSFVNQTIVPXGADMVFDYINAFREQANRLTAIGSVMLWTSLMLIRTID 120 

MIIIIMMIIIIilllll I I I I f I I I I : I I I : 11 I I M I I I I i I I M I I I I I I I I M 
orfl4 4ng PVFDRWSDSFVSFVNQTIVPQGADMVFDYIDAFRDQANRLTAIGSVMLVVTSLMLIRTID 120 

orf 144. pep NTFNRIWRVXXQRPWM 136 
I: I I I I I I I M I I I I 

orfl44ng NAFNRIWRVNTQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDSVLS SGAQQWADAL 180 

The complete length ORF144ng nucleotide sequence <SEQ ED 625> is predicted to encode a 
protein having amino acid sequence <SEQ ID 626>: 



1 MTFLQCWQGS ADNKICAFAW FVIRRFSEER VPQAAASMTF TT LLALVPVL 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI DAFRDQANRL 

101 TAIGSVMLW TSLMLI RTID NAFNRIWRVN TQRPWMMQFL VYWALLTFGP 

151 LSLGVGISFM V GSVQDSVLS SGAQQWADAL KTAARLAFMT LLLWGLYRFV 

201 PNRFVPARQA FVGALITAFC LETARFLFTW YMGNFDGYRS IYGAFAAVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKILLLL 

301 DAAQKEGRTL SVQEFRRHIN MGYDELGELL EKLARYGYIY SGRQGWVLKT 

351 GADSIELSEL FKLFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAEFDA 

401 QAKKQQQS* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 627>: 



1 ATGACCTTTT TACAACGTTG GCAAGGTTTG GCGGACAATA AAATCTGTGC 

51 ATTTGCATGG TTCGTCATCC GCCGTTTCAG TGAAGAGCGC GTACCGCAGG 

101 CAGCGGCGAG CATGACGTTT ACGACACTGC TGGCACTCGT CCCCGTACTG 

151 ACCGTAATGG TCGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGCTGGTC 

201 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CAGGGCGCGG 

251 ATATGGTGTT CGACTATATC GACGCATTCC GCGATCAGGC AAACCGGCTG 

301 ACCGCCATCG GCAGCGTGAT GCTGGTCGTA ACCTCGCTGA TGCTGATTCG 

351 GACGATAGAC AATGCGTTCA ACCGCATCTG GCGGGTTAAC ACGCAACGCC 

401 CCTGGATGAT GCAGTTCCTC GTTTATTGGG CGTTGCTGAC TTTCGGGCCT 

451 TTGTCTTTGG GTGTGGGCAT TTCCTTTATG GTCGGGTCGG TTCAAGACTC 

501 CGTACTCTCC TCCGGAGCGC AACAATGGGC GGACGCGTTG AAGACGGCGG 

551 CAAGGCTGGC TTTCATGACG CTTTTGCTGT GGGGGCTGTA CCGCTTCGTG 

601 CCCAACCGCT TCGTGCCCGC CCGGCAGGCG TTTGTCGGAG CTTTGATTAC 

651 GGCATTCTGC CTGGAGACGG CACGTTTCCT GTTCACCTGG TATATGGGCA 

701 ATTTCGACGG CTACCGCTCG ATTTACGGCG CATTTGCCGC CGTGCCGTTT 

751 TTCCTGCTGT GGTTAAACCT GCTGTGGACG CTGGTCTTGG GCGGGGCGGT 

801 GCTGACTTCG TCGCTGTCTT ATTGGCAGGG CGAGGCCTTC CGCAGGGGAT 

851 TCGACTCGCG CGGACGGTTT GACGACGTGT TGAAAATCCT GCTGCTTCTG 

901 GATGCGGCGC AAAAAGAAGG CCGAACCCTG TCCGTTCAGG AGTTCAGACG 

951 GCATATCAAT ATGGGTTACG ATGAATTGGG CGAGCTTTTG GAAAAGCTGG 

1001 CGCGGTACGG CTATATCTAT TCCGGCAGAC AGGGCTGGGT TTTGAAAACG 

1051 GGGGCGGATT CGATTGAGTT GAGCGAACTC TTCAAGCTCT TCGTGTACCG 

1101 CCCGTTGCct gtggaAAGGG ATCATGTGAA CCAAGCTGtc gaTGCGGTAA 

1151 TGAcgccgtG TTTGCAGACT TTGAACATGA CGCTGGCGGA GTTTGACGCT 

1201 CAGgcgAAAA AACAGCAGCA GTCTTGA 

This encodes a variant of ORF144ng, having the amino acid sequence <SEQ ID 628; ORF144ng-l>; 



1 MTFLQRWQGL ADNKICAFA W FVIRRFSEER VPQAAASMTF TT LLALVPVL 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI DAFRDQANRL 

101 TAIGSVMLW TSLMLI RTID NAFNRIWRVN TQRPWMMQFL VYW ALLTFGP 

151 LSLGVGISFM V GSVQDSVLS SGAQQWADAL KTAARLAFMT LLLWGLYRFV 

201 PNRFVPARQA FVGALITAFC LETARFLFTW YMGNFDGYRS IYGAFA AVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKILLLL 

301 DAAQKEGRTL SVQEFRRHIN MGYDELGELL EKLARYGYIY SGRQGWVLKT 



WO 99/24578 



-351- 



PCT/IB98/01665 



351 GADSIELSEL FKLFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAEFDA 
401 QAKKQQQS * 

ORF144ng-l and ORF144-1 show 94.1% identity in 406 aa overlap: 

orf 14 4ng-l . pep MT FLQRWQGLADNK I CAFAWFV I RRFS E ERV PQAAASMT FTT LLALVP VLT VMVAVAS I F 
5 " I I I I I I I I i I I I I I I I I 1 I i (: I I I : I I I I I I I I 1 I I i I I I I I I ! I I I I I I I I I I I I I I 

orf 14 4-1 MTFLQRLQGLADNK I CAFAWFWRRFDE ERV PQAAASMT FTT LLALVPVLT VMVAVAS IF 

orfl4 4ng-l.pep PVFDRWSDSFVSFVNQTIVPQGADMVFDYIDAFRDQANRLTAIGSVMLWTSLMLIRTID 
I I I I II I I I I I I I I I I I I I II I I I I I I I I I : I I I : I I I I I I I I I I I I I I I I I I I I I I I I I 
10 orf 14 4-1 PVFDRWSDSFVSFVNQTIVPQGADMVFDYINAFREQANRLTAIGSVMLWTSLMLIRTID 



15 



or f 14 4ng-l . pep NAFNRIWRVNTQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDSVLSSGAQQWADAL 
I : I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : : I : I I I II: II 
orf 14 4-1 NTFNRIWRVNSQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDAALASGAPQWSGAL 

orf 144ng-l . pep KTAARLAFMTLLLWGLYRFVPNRFVPARQAFVGALITAFCLETARFLFTWYMGNFDGYRS 
: I I I I : I I I I I I I I i I I I I I I I I I I I I I I I I I II I I I I I I I II 1 I I I I I I I I I I I I I 
or f 1 4 4 - 1 RTAATLTFMTLLLWGLYRFVPNRFVPARQAFVGALATAFCLETARSLFTWYMGNFDGYRS 



20 orf 14 4ng-l . pep IYGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRGFDSRGRFDDVLKILLLL 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 1 I I II I I I I I I I I I I I I I I I I I I I I I I I ! I I 
orf 14 4-1 IYGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRGFDSRGRFDDVLKILLLL 

orfl4 4ng-l.pep DAAQKEGRTLSVQEFRRHINMGYDELGELLEKLARYGYIYSGRQGWVLKTGADSIELSEL 
25 I I I I I I I : : I I I I I I I I I I I I I I II I I I I I I I I I : I I II I I I I I I I I I I I II 1 I I I : I I 

orf 14 4-1 DAAQKEGKALPVQEFRRHINMGYDELGELLEKLARHGYIYSGRQGWVLKTGADSIELNEL 

orf 144ng-l . pep FKLFVYRPLPVERDHVNQAVDAVMTPCLQTLNMTLAEFDAQAKKQQQS 
I I II I I II I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I : I 
30 orf 14 4-1 FKLFVYRPLPVERDHVNQAVDAVMTPCLQTLNMTLAEFDAQAKKRQ 

On this basis of this analysis, including the identification of several putative transmembrane 
domains in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

35 Example 75 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 629>: 

1 ..AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 

51 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

101 GCACCGATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 

40 151 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 

201 CCTGCTTGAA ACACGGGAAC ACGGCTGA 

This corresponds to the amino acid sequence <SEQ ID 630; ORF146>: 

1 ..RHARRIRIDT AINPELEALA EHLHYQWQGF LWLSTDMRQE ISALVILLQR 
51 TRRKWLDAHE RQHLRQSLLE TREHG* 

45 Further work revealed the complete nucleotide sequence <SEQ ID 63 1>: 

1 ATGAACACCT CGCAACGCAA CCGCCTCGTC AGCCGCTGGC TCAACTCCTA 

51 CGAACGCTAC CGCTACCGCC GCCTCATCCA CGCCGTCCGG CTCGGCGGGG 

101 CCGTCCTGTT CGCCACCGCC TCCGCCCGGC TGCTCCACCT CCAACACGGC 

151 GAGTGGATAG GGATGACCGT CTTCGTCGTC CTCGGCATGC TCCAGTTTCA 

50 201 AGGGGCGATT TACTCCAAGG CGGTGGAACG TATGCTCGGC ACGGTCATCG 

251 GGCTGGGCGC GGGTTTGGGC GTTTTATGGC TGAACCAGCA TTATTTCCAC 

301 GGCAACCTCC TCTTCTACCT CACCGTCGGC ACGGCAAGCG CACTGGCCGG 

351 CTGGGCGGCG GTCGGCAAAA ACGGCTACGT CCCTATGCTG GCAGGGCTGA 

401 CGATGTGTAT GCTCATCGGC GACAACGGCA GCGAATGGCT CGACAGCGGA 

55 451 CTCATGCGCG CCATGAACGT CCTCATCGGC GCGGCCATCG CCATCGCCGC 



WO 99/24578 



-352- 



PCT/IB98/01665 



501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 



CGCCAAACTG 
CCGACAACCT 
AGGCGCATGA 
AATCAACGCA 
GCGAAAGCCG 
CGTAAAATCG 
GCAATCTCCC 
TCACACTGCT 
AGACACGCCC 
AGCCCTCGCC 
GCACCAATAT 
ACCCGCCGCA 
CCTGCTTGAA 



CTGCCGCTGA 
GGCCGACTGC 
CCCGCGAACG 
CGCATGGTCA 
CATCAGCCCC 
TCAACACCAC 
AAACTCAACG 
CCAAACCGAC 
GCCGCATCCG 
GAACACCTCC 
GCGTCAGGAA 
AATGGCTGGA 
ACACGGGAAC 



AATCCACACT 
AGCAAAATGA 
CCTCGAGGAG 
AAAGCCGCAG 
GCCATGATGG 
CGAGCTGCTC 
GCAGCGAAAT 
CTGCAACAAA 
CATCGACACC 
ACTACCAATG 
ATTTCCGCCC 
TGCCCACGAA 
ACGGCTGA 



GATGTGGCGT 
TTGCCGAAAT 
AACATGGCGA 
CCATCTCGCC 
AAGCCATGCA 
CTGACCACCG 
CCGGCTGCTT 
CCGTCGCCCT 
GCCATCAACC 
GCAGGGCTTC 
TCGTCATCCT 
CGCCAACACC 



TTCATGCTTG 
CAGCAACGGC 
AAATGCGCCA 
GCCACATCGG 
GCACGCCCAC 
CCGCCAAGCT 
GACCGCCACT 
TATCAACGGC 
CCGAACTGGA 
CTCTGGCTCA 
GCTGCAACGC 
TGCGCCAAAG 



This corresponds to the amino acid sequence <SEQ ID 632; ORF146-l>: 



1 MNTSQRNRLV SRWLNSYERY RYRRLIHAVR LGGAVLFATA SARLLHLQHG 

51 E WIGMTVFW LGMLQFQGA I YSKAVER MLG TVIGLGAGLG VLWL NQHYFH 

101 GNLLFYLTVG TASALAGWAA VGKNGYVPML AGLTMCMLIG DNGSEWLDSG 

151 LMRAMN VLIG AAIAIAAAKL LPL KSTLMWR FMLADNLADC SKMIAEISNG 

201 RRMTRERLEE NMAKMRQINA RMVKSRSHLA ATSGESRISP AMMEAMQHAH 

251 RKIVNTTELL LTTAAKLQSP KLNGSEIRLL DRHFTLLQTD LQQTVALING 

301 RHARRIRIDT AINPELEALA EHLHYQWQGF LWLSTNMRQE ISALVILLQR 

351 TRRKWLDAHE RQHLRQSLLE TREHG* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF146 shows 98.6% identity over a 74aa overlap with an ORF (ORF146a) from strain A of N. 



meningitidis: 



10 20 30 

or f 14 6. pep RHARRI R I DTAI N PE LEALAEHLH YQWQG F 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl4 6a KLNGSE I RLLDRHFTLLQTDLQQTVAL I NGRHARRIR I DTAI NPE LEALAEHLH YQWQG F 

280 290 300 310 320 ■ 330 



40 50 60 70 

orf 14 6 . pep LWLSTDMRQEISALVILLQRTRRKWLDAHERQHLRQSLLETREHGX 

I I I I I : I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : 
orf 14 6a LWLSTNMRQE I SALVILLQRTRRKWLDAHERQHLRQSLLETREHSX 

340 350 360 370 

The complete length ORF146a nucleotide sequence <SEQ ID 633> is: 



1 ATGAACACCT CGCAACGCAA 

51 CGAACGCTAC CGCTACCGCC 

101 CCGTCCTGTT CGCCACCGCC 

151 GAGTGGATAG GGATGACCGT 

201 AGGGGCGATT TACTCCAAGG 

251 GGCTGGGCGC GGGTTTGGGC 

301 GGCAACCTCC TCTTCTACCT 

351 CTGGGCGGCG GTCGGCAAAA 

401 CGATGTGCAT GCTCATCGGC 

451 CTGATGCGCG CGATGAACGT 

501 CGCCAAACTG CTGCCGCTGA 

551 CCGACAACCT GACCGACTGC 

60.1 AGGCGCATGA CCCGCGAACG 

651 AATCAACGCA CGCATGGTCA 

701 GCGAAAGCCG CATCAGCCCC 

751 CGTAAAATTG TCAACACCAC 

801 GCAATCTCCC AAACTCAACG 

851 TCACACTGCT CCAAACCGAC 

901 AGACACGCCC GCCGCATCCG 

951 AGCCCTCGCC GAACACCTCC 

1001 GCACCAATAT GCGTCAGGAA 

1051 ACCCGCCGCA AATGGCTGGA 

1101 CCTGCTTGAA ACACGGGAAC 



CCGCCTCGTC AGCCGCTGGC TCAACTCCTA 
GCCTCATCCA CGCCGTCCGG CTCGGCGGGG 
TCCGCCCGGC TGCTCCACCT CCAACACGGC 
CTTCGTCGTC CTCGGCATGC TCCAGTTTCA 
CGGTGGAACG TATGCTCGGC ACGGTCATCG 
GTTTTATGGC TGAACCAGCA TTATTTCCAC 
CACCGTCGGC ACGGCAAGCG CACTGGCCGG 
ACGGCTACGT CCCTATGCTG GCGGGGCTGA 
GACAACGGCA GCGAATGGTT CGACAGCGGC 
CCTCATCGGC GCGGCCATCG CCATCGCCGC 
AATCCACACT GATGTGGCGT TTCATGCTTG 
AGCAAAATGA TTGCCGAAAT CAGCAACGGC 
CCTCGAAGAG AACATGGCGA AAATGCGCCA 
AAAGCCGCAG CCACCTCGCC GCCACATCGG 
GCCATGATGG AAGCCATGCA GCACGCCCAC 
CGAGCTGCTC CTGACCACCG CCGCCAAGCT 
GCAGCGAAAT CCGGCTGCTT GACCGCCACT 
CTGCAACAAA CCGTCGCCCT TATCAACGGC 
CATCGACACC GCCATCAACC CCGAACTGGA 
ACTACCAATG GCAGGGCTTC CTCTGGCTCA 
ATTTCCGCCC TCGTCATCCT GCTGCAACGC 
TGCCCACGAA CGCCAACACC TGCGCCAAAG 
ACAGTTGA 
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This encodes a protein having amino acid sequence <SEQ ID 634>: 



1 MNTSQRNRLV SRWLNSYERY RYRRLIHAVR LGGAVLFATA SARLLHLQHG 

51 EW IGMTVFW LGMLQFQGA I YSKAVE RMLG TVIGLGAGLG VLWL NQHYFH 

101 GNLLFYLTVG TASALAGWAA VGKNGYVPML AGLTMCMLIG DNGSEWFDSG 

151 LMRAM NVLIG AAIAIAAAKL LPL KSTLMWR FMLADNLTDC SKMIAEISNG 

201 RRMTRERLEE NMAKMRQINA RMVKSRSHLA ATSGESRISP AMMEAMQHAH 

251 RKIVNTTELL LTTAAKLQSP KLNGSEIRLL DRHFTLLQTD LQQTVALING 

301 RHARRIRIDT AINPELEALA EHLHYQWQGF LWLSTNMRQE ISALVILLQR 

351 TRRKWLDAHE RQHLRQSLLE TREHS* 

ORF146a and ORF146-1 show 99.5% identity in 374 aa overlap: 

orf 14 6a . pep MNTSQRNRLVSRWLNSYERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFVV 
I M Ml I II MM MINIMI M I MM I i I Mil IM M I M MM I I INN II II I 
orf 14 6-1 MNTSQRNRLVSRWLNSYERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFW 



or f 1 4 6a . pep LGMLQFQGAIYSKAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTVGTASALAGWAA 
I M M M I M I II IMIIMI M M M II Mill II MM I I III II I M MIMMIII 
orf 14 6-1 LGMLQFQGAI YSKAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTVGTASALAGWAA 

orf 14 6a . pep VGKNGYVPMLAGLTMCMLIGDNGSEWFDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 
M II I M M M M M M M M M M I M M M M II M M M M I M M II I I M I M M 
orf 14 6-1 VGKNGYVPMLAGLTMCMLIGDNGSEWLDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 



orf 146a . pep FMLADNLTDCSKMIAEISNGRRMTRERLEENMAKMRQINARMVKSRSHLAATSGESRISP 
I M M I I M I I I II II M M I M I II M II II II I I II I I II II II I M II M II M II I 
orf 14 6-1 FMLADNLADCSKMIAEISNGRRMTRERLEENMAKMRQINARMVKSRSHLAATSGESRISP 

orf 14 6a . pep AMMEAMQHAH RKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTVALING 
MM M MM Mill lllllll MIMI MIMIIMM I II II II I III IMIMI III 
orf 14 6-1 AMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTVALING 

orf 14 6a . pep RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 
I I I I I I I I I I M I I I I I I I I I M I M I I I I I I I I II I I I I I I I I I I I II I II I M I I I II 
orfl4 6-l RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 

orf 14 6a . pep RQHLRQSLLETREHSX 

I I I 1 E I I j J I I [ I [ : 
orf 14 6-1 RQHLRQSLLETREHGX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF146 shows 97.3% identity over a 75aa overlap with a predicted ORF (ORF146ng) from 
N. gonorrhoeae: 

orf 14 6 . pep RHARRIRIDTAINPELEALAEHLHYQWQGF 30 

M I II I I I I I I 1 I I I I I I I I I I I I I II I I I 
orfl4 6ng KLNGSEIRLLDRHFTLLQTDLQQTAALINGRHARRIRIDTAINPELEALAEHLHYQWQGF 364 

orf 14 6 . pep LWLSTDMRQEISALVILLQRTRRKWLDAHERQHLRQSLLETREHG 75 

II I I I : I II I I II I I I I I II I I I II I I M I I I I I I I I I I I I I I I 
orf 14 6ng LWLSTNMRQE I SALVIPLQRTRRKWLDAHERQHLRQSLLET RE HG 409 

An ORF146ng nucleotide sequence <SEQ ID 63 5> was predicted to encode a protein having amino 
acid sequence <SEQ ID 636>: 



1 MSGVRFPSPA PIPSTDPPSG SLCFFTFPLQ TASDWNSSQR KRLSGRWLNS 

51 YERYRHRRLI HAVRLGGTVL FATALARLLH LQHGEW IGMT VFWLGMLQF 

101 QGAIYSNAVE R MLGTVIGLG AGLGVLWL NQ HYFHGNLLFY LTIGTASALA 

151 GWAAVGKNGY VPMLAGLTMC MLIGDNGSEW LDSGLMRAMN VLIGAAIAIA 

201 AAKLLPL KST LMWRFMLADN LADCSKMIAE ISNGRRMTRE RLEQNMVKMR 

251 QINARMVKSR SHLAATSGES RISPSMMEAM QHAHRKIVNT TELLLTTAAK 

301 LQSPKLNGSE IRLLDRHFTL LQTDLQQTAA LINGRHARRI RIDTAINPEL 

351 EALAEHLHYQ WQGFLWLSTN MRQEISALVI PLQRTRRKWL DAHERQHLRQ 

401 SLLETREHG* 



Further work revealed the following gonococcal DNA sequence <SEQ ID 63 7>: 
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1 ATGAACTCCT CGCAACGCAA ACGCCTTTCC GgccGCTGGC TCAACTCCTA 

51 CGAACGCTac cGCCaccGCC GCCTCATACA TGCCGTGCGG CTCGGCggaa 

101 ccgtCCTGTT CGCCACCGCA CTCGCCCGgc tACTCCACCT CCAacacggc 

151 gAATGGATAG GGAtgaCCGT CTTCGTCGTC CTCGGCATGC TCCAGTTCCA 

201 AGGCgcgatt tActccaacg cggtgGAacg taTGctcggt acggtcatcg 

251 ggctgGGCGC GGGTTTGGgc gTTTTATGGC TGAACCAGCA TTAtttccac 

301 ggcaacCTcc tcttctacct gaccatcggc acggcaagcg cactggccgg 

351 ctGGGCGGCG GTCGGCAAAA acggctacgt ccctatgctg GCGGGGctgA 

401 CGATGTGCAT gctcatcggc gACAACGGCA GCGAATGGCT CGACAGCGGC 

451 CTGATGCGCG CGATGAACGT CCTCATCGGC GCCGCCATCG CCATTGCCGC 

501 CGCCAAACTG CTGCCGCTGA AATCCACACT GATGTGGCGT TTCATGCTTG 

551 CCGACAACCT GGCCGACTGC AGCAAAATGA TTGCCGAAAT CAGCAACGGC 

601 AGGCGTATGA CGCGCGAACG TTTGGAGCAG AATATGGTCA AAATGCGCCA 

651 AATCAACGCA CGCATGGTCA AAAGCCGCAG CCACCTCGCC GCCACATCGG 

701 GCGAAAGCCG CATCAGCCCC TCCATGATGG AAGCCATGCA GCACGCCCAC 

751 CGCAAAATCG TCAACACCAC CGAGCTGCTC CTGACCACCG CCGCCAAGCT 

801 GCAATCTCCC AAACTCAACG GCAGCGAAAT CCGGCTGCTC GACCGCCACT 

851 TCACACTGCT CCAAACCGAC CTGCAACAAA CCGCCGCCCT CATCAACGGC 

901 AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 

951 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

1001 GCACCAATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 

1051 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 

1101 CCTGCTTGAA ACACGGGAAC ACGGCTGA 

This corresponds to the amino acid sequence <SEQ ID 638; ORF146ng-l>: 

1 MNSSQRKRLS GRWLNSYERY RHRRLIHAVR LGGTVLFATA LARLLHLQHG 

51 EW IGMTVFW LGMLQFQGA I YSNAVE RMLG TVIGLGAGLG VLWL NQHYFH 

101 GNLLFYLTIG TASALAGWAA VGKNGYVPML AGLTMCMLIG DNGSEWLDSG 

151 LMRAMN VLIG AAIAIAAAKL LPL KSTLMWR FMLADNLADC SKMIAEISNG 

201 RRMTRERLEQ NMVKMRQINA RMVKSRSHLA ATSGESRISP SMMEAMQHAH 

251 RKIVNTTELL LTTAAKLQSP KLNGSEIRLL DRHFTLLQTD LQQTAALING 

301 RHARRIRIDT AINPELEALA EHLHYQWQGF LWLSTNMRQE ISALVILLQR 

351 TRRKWLDAHE RQHLRQSLLE TREHG* 

ORF146ng-l and ORF146-1 show 96.5% identity in 375 aa overlap 

orf 14 6-1 . pep MNTSQRNRLVSRWLNSYERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFW 
11:111:11 : I I I I I I I I I I : I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl4 6ng-l MNSSQRKRLSGRWLNSYERYRHRRLIHAVRLGGTVLFATALARLLHLQHGEWIGMTVFVV 

orf 14 6-1 . pep LGMLQFQGAI YSKAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTVGTASALAGWAA 
I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! : I I I I I I I I I I I 
orf 14 6ng-l LGMLQFQGAI YSNAVERMLGTV I GLGAGLGVLWLNQHYFHGNLL FY LT I GT AS ALAGWAA 

orf 14 6-1 . pep VGKNGYVPMLAGLTMCMLIGDNGSEWLDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I 1 I I I I I I I I I I I I I I 
orfl4 6ng-l VGKNGYVPMLAGLTMCMLIGDNGSEWLDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 

orf 14 6-1 . pep FMLADNLADCSKMIAEISNGRRMTRERLEENMAKMRQINARMVKSRSHLAATSGESRISP 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl4 6ng-l FMLADNLADCSKMIAEISNGRRMTRERLEQNMVKMRQINARMVKSRSHLAATSGESRISP 

orf 146-1 . pep AMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTVALING 
:!t II I II II II! II I IIMII III MM II II I II I II I I II I! I III II 111:111 il 
orfl4 6ng-l SMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTAALING 

orf 14 6-1 . pep RHARR I R I DTAI N PE LE ALAEHLH YQWQG FLWLSTNMRQE I S ALV I LLQRTRRKW LDAHE 
M M I M M M M M M M M I M M M M II II M M II M II M M M M M M M I I 
orfl4 6ng-l RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 

orf 14 6-1 . pep RQHLRQSLLETREHGX 
I I I I I II I I I I I I! I I 
o r f 1 4 6ng- 1 RQHLRQSLLETREHGX 

Furthermore, ORF146ng-l shows homology with a hypothetical E.coli protein: 

sp|P33011|YEEA_ECOLl HYPOTHETICAL 40.0 KD PROTEIN IN COBU-SBMC INTERGENIC REGION 
>gi 1 1736674 | gnl I PID I dl016553 (D90838) ORF_ID:o348#20; similar to [SwissProt 
Accession Number P33011] [Escherichia coli] >gi 1 1736682 Ignl | PID| dl016560 (D90839) 
ORF_ID:o348#20; similar to [SwissProt Accession Number P33011] [Escherichia coli] 
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>gi 1 1788318 (AE000292) f352; 100% identical to fragment YEEA_ECOLI SW: P33011 but 
has 203 additional Oterminal residues [Escherichia coli] Length = 352 
Score = 109 bits (271), Expect = 2e-23 

Identities = 89/347 (25%), Positives - 150/347 (42%), Gaps = 21/347 (6%) 

YRHRRLIHAVRLGGTVLFATALARLLHLQHGEWIGMTVFWLGMLQFQGAI YSNAVERML 7 9 
YRH R++H R+ L + RL + W +T+ V++G + F G + A ER+ 
YRHYRIVHGTRVALAFLLTFLIIRLFTIPESTWPLVTMWIMGPISFWGNWPRAFERIG 74 

GTVIGLGAGLGVLWLNQHYFHGNLLFYLTIGTASALAGWAAVGKNGYVPMLAGLTMCMLI 139 
GTV+G GL L L L + A L GW A+GK Y +L G+T+ +++ 



E +D+ L R+ +V++G + P ++ + WR LA +L + +++ 



+ R RLE ++ K+ VK R +A S E+RI S+ E +Q +R +V 

PNLLERPRLESHLQKLL TDAVKMRGLIAPASKETRIPKSIYEGIQTINRNLVCMLEL 247 

XXXXXXXXQSPK LNGSEIRLLDRHFXXXXXXXXXXAALINGRHARRIRIDTAINPEL 316 

+ LN ++R D AL G +N + 

QINAYWATRPSHFVLLNAQKLR — DTQHMMQQILLSLVHALYEGNPQPVFANTEKLNDAV 305 

EALAEHL — HYQWQ GFLWLSTNMRQEISALVILLQRTRRK 354 

E L + L H+ + G++WL+ ++ L L+ R RK 

EELRQLLNNHHDLKWETPIYGYVWLNMETAHQLELLSNLICRALRK 352 

On the basis of this analysis, including the identification of several transmembrane domains in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Query: 


20 


Sbjct : 


15 


Query: 


80 


Sbjct: 


75 


Query: 


140 


Sbjct: 


132 


Query: 


200 


Sbjct: 


191 


Query: 


260 


Sbjct: 


248 


Query: 


317 


Sbjct: 


306 



Example 76 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 639> 



1 . . GCCGAAGACA CGCGCGTTAC CGCACAGCTT TTGAGCGCGT ACGGCATTCA 

51 GGGCAAACTC GTCAGTGTGC GCGAACACAA CGAACGGCAG ATGGCGGACA 

101 AGATTGTCGG CTATCTTTCA GACGGCATGG TTGTGGCACA GGTTTCCGAT 

151 GCGGGTACGC CGGCCGTGTG CGACCCGGGC GCGAAACTCG CCCGCCGCGT 

201 GCGTGAGGCC GGGTTTAAAG TCGTTCCCGT CGTGGGCGCA AC . GCGGTGA 

251 TGGCGGCTTT GAGCGTGGCC GGTGTGGAAG GATCCGATTT TTATTTCAAC 

301 GGTTTTGTAC CGCCGAAATC GGGAGAACGC AGGAAACTGT TTGCCAAATG 

351 GGTGCGGGCG GCGTTTCCTA TCGTCATGTT TGAAACGCCG CACCGCATCG 

401 GTGCAGCGCT TGCCGATATG GCGGAACTGT TCCCCGAACG CCGATTAATG 

451 CTGGCGCGCG AAATTACGAA AACGTTTGAA ACGTTCTTAA GCGGCACGGT 

501 TGGGGAAATT CAGACGGCAT TGTCTGCCGA CGGCGACCAA TCGCGCGGCG 

551 AGATGGTGTT GGTGCTTTAT CCGGCGCAGG ATGAAAAACA CGAAGGCTTG 

601 TCCGAGTCCG CGCAAAACAT CATGAAAATC CTCACAGCCG AGCTGCCGAC 

651 CAAACAGGCG GCGGAGCTTG CTGCCAAAAT CACGGGCGAG GGAAAGAAAG 

701 CTTTGTACGA T, . 

This corresponds to the amino acid sequence <SEQ ID 640; ORF147>: 



1 . . AEDTRVTAQL LSAYGIQGKL VSVREHNERQ MADKIVGYLS DGMWAQVSD 

51 AGTPAVCDPG AKLARRVREA GFKWPWGA XAVMAALSVA GVEGSDFYFN 

101 GFVPPKSGER RKLFAKWVRA AFPIVMFETP HRIGAALADM AELFPERRLM 

151 LAREITKTFE TFLSGTVGEI QTALSADGDQ SRGEMVLVLY PAQDEKHEGL 

201 SESAQNIMKI LTAELPTKQA AELAAKITGE GKKALYD. . 

Further work revealed the complete nucleotide sequence <SEQ ID 64 1>: 



1 ATGTTTCAGA AACATTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCGGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATCTGTGC CGAAGACACG 

151 CGCGTTACCG CACAGCTTTT GAGCGCGTAC GGCATTCAGG GCAAACTCGT 
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201 CAGTGTGCGC GAACACAACG AACGGCAGAT GGCGGACAAG ATTGTCGGCT 

251 ATCTTTCAGA CGGCATGGTT GTGGCACAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GTGAGGCCGG 

351 GTTTAAAGTC GTTCCCGTCG TGGGCGCAAG CGCGGTGATG GCGGCTTTGA 

401 GCGTGGCCGG TGTGGAAGGA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

451 CCGAAATCGG GAGAACGCAG GAAACTGTTT GCCAAATGGG TGCGGGCGGC 

501 GTTTCCTATC GTCATGTTTG AAACGCCGCA CCGCATCGGT GCGACGCTTG 

551 CCGATATGGC GGAACTGTTC CCCGAACGCC GATTAATGCT GGCGCGCGAA 

601 ATTACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

651 GACGGCATTG TCTGCCGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

701 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCCGCG 

751 CAAAACATCA TGAAAATCCT CACAGCCGAG CTGCCGACCA AACAGGCGGC 

801 GGAGCTTGCT GCCAAAATCA CGGGCGAGGG AAAGAAAGCT TTGTACGATC 

851 TGGCTCTGTC TTGGAAAAAC AAATAG 

This corresponds to the amino acid sequence <SEQ ID 642; ORF147-l>: 



1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADIICAEDT 

51 RVTAQLLSAY GIQGKLVSVR EHNERQMADK IVGYLSDGMV VAQVSDAGTP 

101 AVCDPGAKLA RRVREAGF KV VPWGASAVM AALSVA GVEG SDFYFNGFVP 

151 PKSGERRKLF AKWVRAAFPI VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL SADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNIMKILTAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical protein ORF286 of E.coli (accession number U18997) 
ORF147 and E.coli ORF286 protein show 36% aa identity in 237aa overlap: 



AEDTRVTAQLLSAYGIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPG 60 
AEDTR T LL +GI +L ++ +HNE+Q A+ ++ L +G +A VSDAGTP + DPG 
AEDTRHTGLLLQHFGINARLFALHDHNEQQKAETLLAKLQEGQNIALVSDAGTPLINDPG 102 

AKLARRVREXXXXXXXXXXXXXXXXXXXXXXXEGSDFYFNGFVPPKSGERRKLFAKWVRA 120 
L R RE F + GF+P KS RR 



++ +E+ HR+ +L D+ + E R ++LARE+TKT+ET VGE+ + D + 



+ +GEMVL++ + E L A + +L AELP K+AA LAA+I G K ALY 

RRKGEMVLIV-EGHKAQEEDLPADALRTLALLQAELPLKKAAALAAEIHGVKECNALY 27 8 

Homology with a predicted ORF from Kmenin&tidis (strain A) 

ORF147 shows 96.6% identity over a 237aa overlap with ORF75a from strain A of N. meningitidis 

10 20 30 

orf 147 . pep AE DTRVT AQLL S AYG I QGKLVS VREHNERQ 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 75a TLYWATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAYGIQGKLVSVREHNERQ 
20 30 40 50 60 70 



Orf 147: 


1 


Orf286: 


43 


Orf 147: 


61 


Orf286: 


103 


Orf 147: 


121 


0rf286: 


163 


Orf 147: 


180 


Orf286: 


223 



40 50 60 70 80 90 

orf 147 . pep MADKIVGYLS DGMWAQVS DAGTPAVCDPGAKLARRVREAGF KWPWGAXAVMAALSVA 
I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I : I I I I I 1 I I I I I I I I I I I I I 
orf 75a MADKIVGYLS DGMWAQVS DAGTPAVCDPGAKLARRVREVGF KWPWGASAVMAALSVA 

80 90 100 110 120 130 



100 110 120 130 140 150 

Orf 147 . pep GVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIVMFETPHRIGAALADMAELFPERRLM 
II I I I I I I I I I I I I 1 I II II I I I I I I I I : I I I : 1 I II I I I I I I I : I I I I I II I I I I I I I 
orf 75a GVAGSDFYFNGFVPPKSGERRKLFAKWVRVAFPWMFETPHRIGATLADMAELFPERRLM 
140 150 160 170 180 190 



orf 147. pep 



160 170 180 190 200 210 

LAREITKTFETFLSGTVGEIQTALSADGDQSRGEMVLVLYPAQDEKHEGLSESAQNIMKI 
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Ml M I I I ! II I M II I I I I I lll:!tl:| I I I I I I Mill I! II I I I II fill II Ml I 
orf75a LARE I TKT FET FLSGTVGE I QTALAADGNQSRGEMVLVL Y P AQDEKHEGLSE S AQN IMKI 

200 210 220 230 240 250 

220 230 
orf 147 .pep LTAELPTKQAAELAAKITGEGKKALYD 
I I I I I I 1 I I I I I I ! I I I I I I I I I I II I 
or f 7 5a LTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 
260 270 280 290 

ORF147a is identical to ORF75a, which includes aa 56-292 of ORF75. 
Homology with a predicted ORF from N. gonorrhoeae 

ORF147 shows 94,1% identity over a 237aa overlap with a predicted ORF (ORF147ng) from N. 



gonorrhoeae: 

orf 147 .pep 
orfl47ng 
orf 147. pep 
orfl47ng 
orf 14 7 .pep 
orfl47ng 
orf 147 .pep 
orf 147ng 
orf 147 .pep 
orfl47ng 



AEDTRVT AQLLS AYG I QGKLVSVREHNERQ 
I I I I I I I I I I I I I I II M : I I I I I I I I i I I 
TLYWATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAYGIQGRLVSVREHNERQ 



30 



85 



90 



MADKI VGYLS DGMWAQVSDAGT PAVCDPGAKLARRVREAGFKWPWGAXAVMAALSVA 
llll::|:|||l:lllllinillllllllllllllllllllllllllll IIIIIMII 
MADKV I GFLS DGLWAQVS DAGT PAVC D PGAKLARRVRE AG FKW P WGAS AVMAALS VA 145 

GVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIVMFETPHRIGAALADMAELFPERRLM 150 
II I M I M I I I I I I II I I I I I I I I I I I M M : II I I I I I I I M : I II I I M II i II M 
GVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPWMFETPHRIGATLADMAELFPERRLM 205 

LAREITKTFETFLSGTVGEIQTALSADGDQSRGEMVLVLYPAQDEKHEGLSESAQNIMKI 210 

lllllllllllllllllllllllhllhlllllllllllllllllllllllllll III 

LARE I TKT FET FLS GTVGE I QTALAADGNQSRGEMVLVLYPAQDEKHEGLSE S AQNAMKI 265 

LTAELPTKQAAELAAKITGEGKKALYD 237 

I : M II II I M M I M M M M I M M 

LAAE LPTKQAAE LAAKI TGEGKKALYDLALS WKNK 300 



An ORF147ng nucleotide sequence <SEQ ID 643> was predicted to encode a protein having amino 



acid sequence <SEQ ID 644>: 



1 MSVFQTAFFM FQKHLQKASD SWGGTLYW ATPIGNLADI TLRALAVLQK 

51 ADIICAEDTR VTAQLLSAYG IQGRLVSVRE HNERQMADKV IGFLSDGLW 

101 AQVSDAGTPA VCDPGAKLAR RVREAGF KW PWGASAVMA ALSVA GVAES 

151 DFYFNGFVPP KSGERRKLFA KWVRAAFPW MFETPHRIGA TLADMAELFP 

201 ERRLMLAREI TKT FET FLS G TVGEIQTALA ADGNQSRGEM VLVLYPAQDE 

251 KHEGLSESAQ NAMKILAAEL PTKQAAELAA KITGEGKKAL YDLALSWKNK 

301 * 

Further work revealed the following gonococcal DNA sequence <SEQ ID 645>: 



1 ATGTTTCAGA AACACTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCAGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATTTGTGC CGAAGACACG 

151 CGCGTTACTG CGCAGCTTTT GAGCGCGTAC GGCATTCAGG GCAGGTTGGT 

201 CAGTGTGCGC GAACACAACG AGCGGCAGAT GGCGGACAAG GTAATCGGTT 

251 TCCTTTCAGA CGGCCTGGTT GTGGCGCAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GCGAAGCAGG 

351 GTTCAAAGTC GTTCCCGTCG TGGGCGCAAG CGCGGTAATG GCGGCGTTGA 

401 GTGTGGCCGG TGTGGCGGAA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

451 CCGAAATCGG GCGAACGTAG GAAATTGTTT GCCAAATGGG TGCGGGCGGC 

501 ATTTCCTGTC GTCATGTTTG AAACGCCGCA CCGAATCGGG GCAACGCTTG 

551 CCGATATGGC GGAATTGTTC CCCGAACGCC GTCTGATGCT GGCGCGCGAA 

601 ATCACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

651 GACGGCATTG GCGGCGGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

701 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCTGCG 

751 CAAAATGCGA TGAAAATCCT TGCGGCCGAG CTGCCGACCA AGCAGGCGGC 

801 GGAGCTTGCC GCCAAGATTA CAGGTGAGGG CAAAAAGGCT TTGTACGATT 

851 TGGCACTGTC GTGGAAAAAC AAATGA 
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This corresponds to the amino acid sequence <SEQ ID 646; ORF147ng-l>: 

1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADIICAEDT 

51 RVTAQLLSAY GIQGRLVSVR EHNERQMADK VIGFLSDGLV VAQVSDAGTP 

101 AVCDPGAKLA RRVREAGF KV VPWGASAVM AALSVA GVAE SDFYFNGFVP 

151 PKSGERRKLF AKWVRAAFPV VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL AADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNAMKILAAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

ORF147ng shows homology to a hypothetical Exoli protein: 

spl P45528I YRAL_ECOLI HYPOTHETICAL 31.3 KD PROTEIN IN AGAI-MTR INTERGENIC REGION 
(F286) 

>gi 1 606086 (U18997) 0RF_f286 [Escherichia coli] 

>gi | 1789535 (AE000395) hypothetical 31.3 kD protein in agai-mtr intergenic region 
[Escherichia coli] Length = 286 
Score = 218 bits (550), Expect « 3e-56 

Identities = 128/284 (45%), Positives - 171/284 (60%), Gaps = 4/284 (1%) 

Query: 4 KHLQKASDSWGGTLYVVATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAYGIQ 63 

K Q A +S G LY+V TPIGNLADIT RAL VLQ D+I AEDTR T LL +GI 
Sbjct: 2 KQHQSADNSQ — GQLYIVPTPIGNLADITQRALEVLQAVDLIAAEDTRHTGLLLQHFGIN 59 

Query: 64 GRLVSVREHNERQMADKVIGFLSDGLWAQVSDAGTPAVCDPGAKLARRVREAGFKWPV 123 

RL ++ +HNE+Q A+ ++ L +G +A VSDAGTP + DPG L R REAG +WP+ 
Sbjct: 60 ARLFALHDHNEQQKAETLLAKLQEGQNIALVSDAGTPLINDPGYHLVRTCREAGIRWPL 119 

Query: 124 VGASAVMAALSVAGVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPWMFETPHRIGATL 183 

G A + ALS AG+ F + GF+P KS RR ++ +E+ HR+ +L 

Sbjct: 120 PGPCAAITALSAAGLPSDRFCYEGFLPAKSKGRRDALKAIEAEPRTLIFYESTHRLLDSL 179 

Query: 184 ADMAELFPERR-LMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQDEK 242 

D+ + E R ++LARE+TKT+ET VGE+ + D N+ +GEMVL++ + 

Sbjct: 180 EDIVAVLGESRYWLARELTKTWETIHGAPVGELLAWVKEDENRRKGEMVLIV-EGHKAQ 238 

Query: 243 HEGLSESAQNAMKILAAELPTKQAAELAAKITGEGKKALYDLAL 286 

EL A + +L AELP K+AA LAA+I G K ALY AL 
Sbjct: 239 EEDLPADALRTLALLQAELPLKKAAALAAEIHGVKKNALYKYAL 282 

Based on the computer analysis and the presence of a putative transmembrane domain in the 
gonococcal protein, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 77 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 647> 

1 ATGAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCGAA 

51 AACCGGTCGC ATCCGCTTCT C.GCTGCTTA CTTAGCCATA TGCCTGTCGT 

101 TCGGCATTCT TCCCCAAGCC TGGGCGGGAC ACACTTATTT CGGCATCAAC 

151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 

201 GGCGAAAGAT ATTGAGGTTT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 

251 CAATGACAAA AGCCCCGATG ATTGATTTTT CTGTGGTGTC GCGTAACGGC 

301 GTGGCGGcAT TGGTGGGCGt ATCAATATAT TGTGAGCGTG GCACATAACG 

351 GCGGCTATAA CAACGTTGAT TTTGGTGCGG AAGGAAk . AA tATCCC . GAT 

401 CAACAwCGww TTACTTATAA AATTGTGAAA CGGAATAATT ATAAAGCAGG 

451 GACTAAAGGC CATCCTTATG GCGGCGATTA TCATATGCCG CGTTTGCATA 

501 AATwTGTCAC AGATGCAGAA CCTGTTGAAA TGACCAGTTA TATGGATGGG 

551 CGGAAATATA TCGATCAAAA TAATTACCCT GACCGTGTTC GTATTGGGGC 

601 AGGCAGGCAA TATTGGCGAT CTGATGAAGA TGAGCCCAAT AACCGCGAAA. 

651 GTTCATATCA TATTGCAAGT 

701 GGCTC ACCAATGTTT ATCTATGATG CCCAAAAGCA 

751 AAAGTGGTTA ATTAATGGGG TATTGCAAAC GGGCAACCCC TATATAGGAA 

801 AAAGCAATGG CTTCCAGCTG GTTCGTAAAG ATTGGTTCTA TGATGAAATC 

851 TTTGCTGGAG ATACCCATTC AGTATTCTAC GAACCACGTC AAAATGGGAA 

901 ATACTCTTTT AACGACGATA ATAATGGCAC AGGAAAAATC AATGCCAAAC 
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951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 

2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 

3551 
3601 
3651 
3701 
3751 
3801 
3851 
3901 
3951 
4001 
4051 
4101 
4151 
4201 
4251 
4301 
4351 



ATGAACACAA 
TTTAATGTTT 
AGGTGGTGTC 
CCTTTATTGA 
CAAGGTGCTG 
AAATAACGAA 
CCGTTACTTG 
GGCAAAGGCA 



TTCTCTGCCT 
CTTTATCCGA 
AACAGTTATC 
CGAAGGAAAA 
GAGGATTATA 
ACTTGGCAAG 
GAAAGTAAAC 
CGCTG 



AATAGATTAA 
GACAGCAAGA 
GACCCAGACT 
GGCGAATTGA 
TTTCCAAGGA 
GCGCGGGCGT 
GGCGTGGCAA 



AAACACGAAC 
GAACCTGTTT 
GAATAATGGA 
TACTTACCAG 
GATTTTACGG 
TCATATCAGT 
ACGACCGCCT 



CGTTCAATTG 
ATCATGCTGC 
GAAAATATTT 
CAACATCAAT 
TCTCGCCTGA 
GAAGACAGTA 
GTCCAAAATC 



// 



TGACTGCTTC 
GATCACGCTC 
TAGTGCAAAT 
ACGGCAACCk 
ACATTAAACG 
CGACCACGCC 
CAAACGTAAG 
GCAGTATTCC 
CAagGATACG 
GarCGGAATT 
TCCGCCTATC 
TGCGCCGCGC 
CACCGCCAAC 
AAATTGAACG 
CCGCAGCGAC 
TGGCGGTCAA 
GTAGTGGAAG 
CCTGCAAAAC 



ATTGACTAAG 
ATTTAAATCT 
GGCGATACAC 
TAgCCtCGtG 
GCAACACATC 
GTACAAAACG 
CCATTCCGCA 
ATTTTGAAAG 
GCATTACACT 
AGGCAATTTA 
GCCACGATGC 
CGCCGTTCGC 
TTCGGTAGAA 
GTCAGGGAAC 
AAATTGAAGC 
CAATACCGGC 
GAAAAGACAA 
GAACACGTCG 



CCGCAACGCC 
CGCAAGATTT 
ATGCAGAAAA 
CCGGACCGAA 
CCCACGGCGC 
ATCAGnCGCG 
GAGsmAAAwT 
CGCGCCGgtt 
ctATTTCGTC 
CCCCCGGCCT 
TCATTCAAAC 
CTATACCGAT 
TATTGGCTCA 
GCCGAAATCA 
CCCGCAACTG 
GGTAA. . . 



GTTTGGACAA 
CCGCGCCTAC 
ACCTCGGCAG 
AACACCTTCG 
CGTTTTCGGG 
GGCGCGGGTT 
CCGCCGCCGC 
tCggCGgATt 
CAAAAAGCGG 
TGCATTCAAC 
CGGCGCAACA 
GCCGCTTCGG 
GGATTTCGGC 
AAGGTTTCAC 
GAAGCGCAAC 



ACCGACATCA 
CACAGGGCTT 
GTTATACAGT 
G.sAATGcCC 
GGCTTCgGGC 
GCAGTCTGAC 
CTCAACGGTA 
CAGCCGCTTT 
TAAAAGACAG 
AACCTTGACA 
GGCAGGGGCG 
GCCGTTCGCG 
TCCCGTTTCA 
ATTCCGCTTT 
TGGCGGAAAG 
AACGAACCTG 
CAAACCGCTG 
ATGCAGGCGC 
II 

. . . . TTAGAC 
GCGGCATCCG 
CGCCAACAAA 
CGGGCGCGTC 
ACGACGGCAT 
CAATACGGCA 
TTAGCAGCGG 
GTGCtGCATT 
CGGCATCGAA 
ATTACCGCTA 
CGcTACCGCG 
CATTTCCATC 
GCAAAGTCCG 
AAAACCCGCA 
GCTGTCCCTC 
ACAGCGCGGG 



GCGGCAATGT 
GCCACACTCA 
CAGCCACAAC 
AAGCAACATT 
AATGCTTCAT 
GCTTTCCGGC 
ATGTCTCCCT 
ACCGGACAAA 
CGAATGGACG 
ACGCCACCAT 
CAAACCGGCA 
CCGTTCCCTA 
ACACGCTGAC 
ATGTCGGAAC 
TTCCGAAGGC 
CAAGCCTCGA 
TCCGAAAACC 
GTGG 



. . . GATAAAG 
CGATCTTGCC 
ACGGCAATCT 
GCCACCCAAA 
TAATCAAGCC 
TTAATCTAAG 
AACGCTAAGG 
AGCCGATAAG 
TCAGCGGCGG 
CTGCCGTCAg 
TACaCTCAAT 
GTGCGACAGA 
TTATmCGTTA 
GGTAAACGGC 
TCTTCGGCTA 
ACTTACACCT 
ACAATTGACG 
TTAATTTCAC 



CGCGTATTTG 
GGACACCAAA 
CCGACCTGCG 
GGCATCCTGT 
CGGCAACTCG 
TCGACAGGTT 
CAGCCTTTcA 
ACGGCATTCA 
CCGCACATCG 
CGAAAACGTC 
CGGGCATTAa 
ACGCCTTATT 
AACACGCGTC 
GTGCGGAATG 
CACGCTGCCG 
CATCAAATTA 



CCGAAGACCG 
CACTACCGTT 
CCAAATCGGT 
TTTCGCACAA 
GCACGGCTTG 
CTACATCGGC 
GACGGCATCG 
GGCACGAtAC 
GCGCAACGCg 
AATATCGCCA 
GGCAGATTAT 
TGAGCCTGTC 
AATACCGCCG 
GGgCGTAAAC 
CCGCCAAAGG 
GGCTACCGCT 



corresponds to the amino acid sequence <SEQ ID 648; ORFl>: 



1 MKTTDKRTTE THRKAPKTGR IRFXAAYLAI CLSFGILPQA WAGHTYFGIN 

51 YQYYRDFAEN KGKFAVGAKD IEVYNKKGEL VGKSMTKAPM IDFSWSRNG 

101 VAALVGVQYI VSVAHNGGYN NVDFGAEGXN IXDQXRXTYK IVKRNNYKAG 

151 TKGHPYGGDY HMPRLHKXVT DAEPVEMTSY MDGRKYIDQN NYPDRVRIGA 

201 GRQYWRSDED EPNNRESSYH IAS GS PMFIYDAQKQ 

251 KWLINGVLQT GNPYIGKSNG FQLVRKDWFY DEIFAGDTHS VFYEPRQNGK 

301 YSFNDDNNGT GKINAKHEHN SLPNRLKTRT VQLFNVSLSE TAREPVYHAA 

351 GGVNSYRPRL NNGENISFID EGKGELILTS NINQGAGGLY FQGDFTVSPE 

401 NNETWQGAGV HISEDSTVTW KVNGVANDRL SKIGKGTL 

// 

701 DKVTAS LTKTDISGNV DLADHAHLNL TGLATLNGNL 

751 SANGDTRYTV SHNATQNGNX SLVXNAQATF NQATLNGNTS ASGNASFNLS 

801 DHAVQNGSLT LSGNAKANVS HSALNGNVSL ADKAVFHFES SRFTGQISGG 

851 KDTALHLKDS EWTLPSGXEL GNLNLDNATI TLNSAYRHDA AGAQTGSATD 

901 APRRRSRRSR RSLLXVTPPT SVESRFNTLT VNGKLNGQGT FRFMSELFGY 

951 RSDKLKLAES SEGTYTLAVN NTGNEPASLE QLTWEGKDN KPLSENLNFT 

1001 LQNEHVDAGA W 

// 

1151 LDRVFAEDR 

1201 RNAVWTSGIR DTKHYRSQDF RAYRQQTDLR QIGMQKNLGS GRVGILFSHN 

1251 RTENTFDDGI GNSARLAHGA VFGQYGIDRF YIGISAGAGF SSGSLSDGIG 

1301 XKXRRRVLHY GIQARYRAGF GGFGIEPHIG ATRYFVQKAD YRYENVNIAT 

1351 PGLAFNRYRA GIKADYSFKP AQHISITPYL SLSYTDAASG KVRTRVNTAV 
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14 01 LAQDFGKTRS AEWGVNAEIK GFTLSLHAAA AKGPQLEAQH SAGIKLGYRW 

1451 * 

sequencing analysis revealed the complete nucleotide sequence <SEQ ID 649>: 

1 ATGAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCGAA 

51 AACCGGCCGC ATCCGCTTCT CGCCTGCTTA CTTAGCCATA TGCCTGTCGT 

101 TCGGCATTCT TCCCCAAGCC TGGGCGGGAC ACACTTATTT CGGCATCAAC 

151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 

201 GGCGAAAGAT ATTGAGGTTT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 

251 CAATGACAAA AGCCCCGATG ATTGATTTTT CTGTGGTGTC GCGTAACGGC 

301 GTGGCGGCAT TGGTGGGCGA TCAATATATT GTGAGCGTGG CACATAACGG 

351 CGGCTATAAC AACGTTGATT TTGGTGCGGA AGGAAGAAAT CCCGATCAAC 

401 ATCGTTTTAC TTATAAAATT GTGAAACGGA ATAATTATAA AGCAGGGACT 

451 AAAGGCCATC CTTATGGCGG CGATTATCAT ATGCCGCGTT TGCATAAATT 

501 TGTCACAGAT GCAGAACCTG TTGAAATGAC CAGTTATATG GATGGGCGGA 

551 AATATATCGA TCAAAATAAT TACCCTGACC GTGTTCGTAT TGGGGCAGGC 

601 AGGCAATATT GGCGATCTGA TGAAGATGAG CCCAATAACC GCGAAAGTTC 

651 ATATCATATT GCAAGTGCGT ATTCTTGGCT CGTTGGTGGC AATACCTTTG 

701 CACAAAATGG ATCAGGTGGT GGCACAGTCA ACTTAGGTAG TGAAAAAATT 

751 AAACATAGCC CATATGGTTT TTTACCAACA GGAGGCTCAT TTGGCGACAG 

801 TGGCTCACCA ATGTTTATCT ATGATGCCCA AAAGCAAAAG TGGTTAATTA 

851 ATGGGGTATT GCAAACGGGC AACCCCTATA TAGGAAAAAG CAATGGCTTC 

901 CAGCTGGTTC GTAAAGATTG GTTCTATGAT GAAATCTTTG CTGGAGATAC 

951 CCATTCAGTA TTCTACGAAC CACGTCAAAA TGGGAAATAC TCTTTTAACG 

1001 ACGATAATAA TGGCACAGGA AAAATCAATG CCAAACATGA ACACAATTCT 

1051 CTGCCTAATA GATTAAAAAC ACGAACCGTT CAATTGTTTA ATGTTTCTTT 

1101 ATCCGAGACA GCAAGAGAAC CTGTTTATCA TGCTGCAGGT GGTGTCAACA 

1151 GTTATCGACC CAGACTGAAT AATGGAGAAA ATATTTCCTT TATTGACGAA 

1201 GGAAAAGGCG AATTGATACT TACCAGCAAC ATCAATCAAG GTGCTGGAGG 

1251 ATTATATTTC CAAGGAGATT TTACGGTCTC GCCTGAAAAT AACGAAACTT 

1301 GGCAAGGCGC GGGCGTTCAT ATCAGTGAAG ACAGTACCGT TACTTGGAAA 

1351 GTAAACGGCG TGGCAAACGA CCGCCTGTCC AAAATCGGCA AAGGCACGCT 

1401 GCACGTTCAA GCCAAAGGGG AAAACCAAGG CTCGATCAGC GTGGGCGACG 

1451 GTACAGTCAT TTTGGATCAG CAGGCAGACG ATAAAGGCAA AAAACAAGCC 

1501 TTTAGTGAAA TCGGCTTGGT CAGCGGCAGG GGTACGGTGC AACTGAATGC 

1551 CGATAATCAG TTCAACCCCG ACAAACTCTA TTTCGGCTTT CGCGGCGGAC 

1601 GTTTGGATTT AAACGGGCAT TCGCTTTCGT TCCACCGTAT TCAAAATACC 

1651 GATGAAGGGG CGATGATTGT CAACCACAAT CAAGACAAAG AATCCACCGT 

1701 TACCATTACA GGCAATAAAG ATATTGCTAC AACCGGCAAT AACAACAGCT 

1751 TGGATAGCAA AAAAGAAATT GCCTACAACG GTTGGTTTGG CGAGAAAGAT 

1801 ACGACCAAAA CGAACGGGCG GCTCAACCTT GTTTACCAGC CCGCCGCAGA 

1851 AGACCGCACC CTGCTGCTTT CCGGCGGAAC AAATTTAAAC GGCAACATCA 

1901 CGCAAACAAA CGGCAAACTG TTTTTCAGCG GCAGACCAAC ACCGCACGCC 

1951 TACAATCATT TAAACGACCA TTGGTCGCAA AAAGAGGGCA TTCCTCGCGG 

2001 GGAAATCGTG TGGGACAACG ACTGGATCAA CCGCACATTT AAAGCGGAAA 

2051 ACTTCCAAAT TAAAGGCGGA CAGGCGGTGG TTTCCCGCAA TGTTGCCAAA 

2101 GTGAAAGGCG ATTGGCATTT GAGCAATCAC GCCCAAGCAG TTTTTGGTGT 

2151 CGCACCGCAT CAAAGCCACA CAATCTGTAC ACGTTCGGAC TGGACGGGTC 

2201 TGACAAATTG TGTCGAAAAA ACCATTACCG ACGATAAAGT GATTGCTTCA 

2251 TTGACTAAGA CCGACATCAG CGGCAATGTC GATCTTGCCG ATCACGCTCA 

2301 TTTAAATCTC ACAGGGCTTG CCACACTCAA CGGCAATCTT AGTGCAAATG 

2351 GCGATACACG TTATACAGTC AGCCACAACG CCACCCAAAA CGGCAACCTT 

2401 AGCCTCGTGG GCAATGCCCA AGCAACATTT AATCAAGCCA CATTAAACGG 

2451 CAACACATCG GCTTCGGGCA ATGCTTCATT TAATCTAAGC GACCACGCCG 

2501 TACAAAACGG CAGTCTGACG CTTTCCGGCA ACGCTAAGGC AAACGTAAGC 

2551 CATTCCGCAC TCAACGGTAA TGTCTCCCTA .GCCGATAAGG CAGTATTCCA 

2601 TTTTGAAAGC AGCCGCTTTA CCGGACAAAT CAGCGGCGGC AAGGATACGG 

2651 CATTACACTT AAAAGACAGC GAATGGACGC TGCCGTCAGG CACGGAATTA 

2701 GGCAATTTAA ACCTTGACAA CGCCACCATT ACACTCAATT CCGCCTATCG 

2751 CCACGATGCG GCAGGGGCGC AAACCGGCAG TGCGACAGAT GCGCCGCGCC 

2801 GCCGTTCGCG CCGTTCGCGC CGTTCCCTAT TATCCGTTAC ACCGCCAACT 

2851 TCGGTAGAAT CCCGTTTCAA CACGCTGACG GTAAACGGCA AATTGAACGG 

2901 TCAGGGAACA TTCCGCTTTA TGTCGGAACT CTTCGGCTAC CGCAGCGACA 

2951 AATTGAAGCT GGCGGAAAGT TCCGAAGGCA CTTACACCTT GGCGGTCAAC 

3001 AATACCGGCA ACGAACCTGC AAGCCTCGAA CAATTGACGG TAGTGGAAGG 

3051 AAAAGACAAC AAACCGCTGT CCGAAAACCT TAATTTCACC CTGCAAAACG 

3101 AACACGTCGA TGCCGGCGCG TGGCGTTACC AACTCATCCG CAAAGACGGC 

3151 GAGTTCCGCC TGCATAATCC GGTCAAAGAA CAAGAGCTTT CCGACAAACT 

3201 CGGCAAGGCA GAAGCCAAAA AACAGGCGGA AAAAGACAAC GCGCAAAGCC 

3251 TTGACGCGCT GATTGCGGCC GGGCGCGATG CCGTCGAAAA GACAGAAAGC 

3301 GTTGCCGAAC CGGCCCGGCA GGCAGGCGGG GAAAATGTCG GCATTATGCA 
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3351 
3401 
3451 
3501 
3551 
3601 
3651 
3701 
3751 
3801 
3851 
3901 
3951 
4001 
4051 
4101 
4151 
4201 
4251 
4301 
4351 



GGCGGAGGAA 
CGAAACAGCG 
GCCCGCCGCG 
CCAACCGCAG 
AATTTTCCGC 
CGCGTATTTG 
GGACACCAAA 
CCGACCTGCG 
GGCATCCTGT 
CGGCAACTCG 
TCGACAGGTT 
AGCCTTTCAG 
CGGCATTCAG 
CGCACATCGG 
GAAAACGTCA 
GGGCATTAAG 
CGCCTTATTT 
ACACGCGTCA 
TGCGGAATGG 
ACGCTGCCGC 
ATCAAATTAG 



GAGAAAAAAC 
CGAAGCGGAA 
CCCGCCGGGA 
CGCGACCTGA 
CACGCTCAAC 
CCGAAGACCG 
CACTACCGTT 
CCAAATCGGT 
TTTCGCACAA 
GCACGGCTTG 
CTACATCGGC 
ACGGCATCGG 
GCACGATACC 
CGCAACGCGC 
ATATCGCCAC 
GCAGATTATT 
GAGCCTGTCC 
ATACCGCCGT 
GGCGTAAACG 
CGCCAAAGGC 
GCTACCGCTG 



GGGTGCAGGC 
ACCCGGCCGG 
TTTGCCGCAA 
TCAGCCGTTA 
AGCGTTTTCG 
CCGCAACGCC 
CGCAAGATTT 
ATGCAGAAAA 
CCGGACCGAA 
CCCACGGCGC 
ATCAGCGCGG 
AGGCAAAATC 
GCGCCGGTTT 
TATTTCGTCC 
CCCCGGCCTT 
CATTCAAACC 
TATACCGATG 
ATTGGCTCAG 
CCGAAATCAA 
CCGCAACTGG 
GTAA 



GGATAAAGAC 
CTACCACCGC 
CTGCAACCCC 
TGCCAATAGC 
CCGTACAGGA 
GTTTGGACAA 
CCGCGCCTAC 
ACCTCGGCAG 
AACACCTTCG 
CGTTTTCGGG 
GCGCGGGTTT 
CGCCGCCGCG 
CGGCGGATTC 
AAAAAGCGGA 
GCATTCAACC 
GGCGCAACAC 
CCGCTTCGGG 
GATTTCGGCA 
AGGTTTCACG 
AAGCGCAACA 



ACCGCCTTGG 
CTTCCCCCGC 
AACCGCAGCC 
GGTTTGAGTG 
CGAATTAGAC 
GCGGCATCCG 
CGCCAACAAA 
CGGGCGCGTC 
ACGACGGCAT 
CAATACGGCA 
TAGCAGCGGC 
TGCTGCATTA 
GGCATCGAAC 
TTACCGCTAC 
GCTACCGCGC 
ATTTCCATCA 
CAAAGTCCGA 
AAACCCGCAG 
CTGTCCCTCC 
CAGCGCGGGC 



This corresponds to the amino acid sequence <SEQ ID 650; ORFl-l>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 



MKTTDKRTTE 
YQYYRDFAEN 
VAALVGDQYI 
KGHPYGGDYH 
RQYWRSDEDE 
KHSPYGFLPT 
QLVRKDWFYD 
LPNRLKTRTV 
GKGELILTSN 
VNGVANDRLS 
FSEIGLVSGR 
DEGAMIVNHN 
TTKTNGRLNL 
YNHLNDHWSQ 
VKGDWHLSNH 
LTKTDISGNV 
SLVGNAQATF 
HSALNGNVSL 
GNLNLDNATI 
SVESRFNTLT 
NTGNEPASLE 
EFRLHNPVKE 
VAEPARQAGG 
ARRARRDLPQ 
RVFAEDRRNA 
GILFSHNRTE 
SLSDGIGGKI 
ENVNIATPGL 
TRVNTAVLAQ 
IKLGYRW* 



THRKAPKTGR 
KGKFAVGAKD 
VSVAHNGGYN 
MPRLHKFVTD 
PNNRESSYHI 
GGSFGDSGSP 
EIFAGDTHSV 
QLFNVSLSET 
INQGAGGLYF 
KIGKGTLHVQ 
GTVQLNADNQ 
QDKESTVTIT 
VYQPAAEDRT 
KEGIPRGEIV 
AQAVFGVAPH 
DLADHAHLNL 
NQATLNGNTS 
ADKAVFHFES 
TLNSAYRHDA 
VNGKLNGQGT 
QLTWEGKDN 
QELSDKLGKA 
ENVGIMQAEE 
LQPQPQPQPQ 
VWTSGIRDTK 
NTFDDGIGNS 
RRRVLHYGIQ 
AFNRYRAGIK 
DFGKTRSAEW 



IRFSPAYLAI CLSFGILPQA 



IEVYNKKGEL 
NVDFGAEGRN 
AEPVEMTSYM 
ASAYSWLVGG 
MFIYDAQKQK 
FYEPRQNGKY 
AREPVYHAAG 
QGDFTVSPEN 
AKGENQGSIS 
FNPDKLYFGF 
GNKDIATTGN 
LLLSGGTNLN 
WDNDWINRTF 
QSHTICTRSD 
TGLATLNGNL 
ASGNASFNLS 
SRFTGQISGG 
AGAQTGSATD 
FRFMSELFGY 
KPLSENLNFT 
EAKKQAEKDN 
EKKRVQADKD 
RDLISRYANS 
HYRSQDFRAY 
ARLAHGAVFG 
ARYRAGFGGF 
ADYSFKPAQH 
GVNAEIKGFT 



VGKSMTKAPM 
PDQHRFTYKI 
DGRKYIDQNN 
NTFAQNGSGG 
WLINGVLQTG 
SFNDDNNGTG 
GVNSYRPRLN 
NETWQGAGVH 
VGDGTVILDQ 
RGGRLDLNGH 
NNSLDSKKEI 
GNITQTNGKL 
KAENFQIKGG 
WTGLTNCVEK 
SANGDTRYTV 
DHAVQNGSLT 
KDTALHLKDS 
APRRRSRRSR 
RSDKLKLAES 
LQNEHVDAGA 
AQSLDALIAA 
TALAKQREAE 
GLSEFSATLN 
RQQTDLRQIG 
QYGIDRFYIG 
GIEPHIGATR 
ISITPYLSLS 
LSLHAAAAKG 



WAGHTYFGIN 
IDFSWSRNG 
VKRNNYKAGT 
YPDRVRIGAG 
GTVNLGSEKI 
NPYIGKSNGF 
KINAKHEHNS 
NGENISFIDE 
ISEDSTVTWK 
QADDKGKKQA 
SLSFHRIQNT 
AYNGWFGEKD 
FFSGRPTPHA 
QAWSRNVAK 
TITDDKVIAS 
SHNATQNGNL 
LSGNAKANVS 
EWTLPSGTEL 
RSLLSVTPPT 
SEGTYTLAVN 
WRYQLIRKDG 
GRDAVEKTES 
TRPATTAFPR 
SVFAVQDELD 
MQKNLGSGRV 
ISAGAGFSSG 
YFVQKADYRY 
YTDAASGKVR 
PQLEAQHSAG 



Computer analysis of these sequences gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF1 shows 57.8% identity over a 1456aa overlap with an ORF (ORF la) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 1 . pep MKTT DKRTTETHRKAPKTGR I RFXAAY LAI CLSFG I L PQAWAGHTYFG IN YQYYRDFAEN 

I I I I I I I I i I I i I I I I I I I I I I I I I if I I I II ! I M I I I I I I I I I M I I I i I I I I I I 1 
O r f 1 a MKTT DKRTTETHRKAPKTGR IRFSPAYLAICLSFGIL PQAWAGHTY FG INYQYYRD FAEN 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 1 . pep KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGVQYI VSVAHNGGYN 
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orfla 



10 



15 



20 



1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 il 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGDQYIVSVAHNGGYN 
70 80 90 100 110 120 



130 140 150 160 170 180 

orf 1 . pep NVDFGAEGXNIXDQXRXTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKXVTDAEPVEMTSY 

I I I I I I I I I I II I : I : I I II I I I I : : 111 = 11 J I I I I I J I I I I I I I I I I I 
orfla NVDFGAEGXN-PDQHRFSYQIVKRNNYKPDNS-HPYNGDXHMPRLHKFVTDAEPVEMTSD 

130 140 150 160 170 

190 200 210 

orf 1 . pep MDGRKYIDQNNYPDRVRIGAGRQYWRSDEDEP NN 

i I i |:::||:lllll:|::lll |:|: II 
orfla MRGNTYSDKEKYPERVRIGSGHHYWRYDDDKHGDLSYSGAWLIGGNTHMQGWGNNGVXSL 
180 190 200 210 220 230 

220 230 240 250 260 

orf 1 . pep RESSYH IA SGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGFQLVRK 

I : : : : II I I I I I I I I I : : I I I : I I I I I I I I I I : I I I I I : I I 

orfla SGDVRHANDYGPMPIAGAAGDSGSPMFIYDKTNNKWLLNGVLQTGYPYSGRENGFQLIRK 
240 250 260 270 280 290 



25 



270 280 290 300 310 320 

orf 1 . pep DWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRTVQLFNV 

UNCI: ||||:| :|| 1:1 |::||:::| JIM :: : I : I I :||::||:||: 
orfla DWFYDDIYRGDTHTVXFEPRSNGHFSFTSNNNGTGTVTETNEKVSNP-KLKVQTVRLFDE 
300 310 320 330 340 350 



30 



35 



40 



330 340 350 360 370 380 

orf 1 . pep SLSETAREPVYHAAGGVNSYRPRLNNGENISFIDEGKGELILTSNINQGAGGLYFQGDFT 

11:11 I I I I I I: I I I I I I I I I I : I I I I I : I : I I I :: I I I I I I I I I I I : I I I I 

orfla SLNETDKEPVY-AAGGVNQYRPRLNNGENLSFIDYGNGKLILSNNINQGAGGLYFEGDFT 

360 370 380 390 400 410 

390 400 410 420 430 

orf 1 . pep VSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTL 

I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 
orfla VSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTLHVQAKGENQGSISVGDGT 
420 430 440 450 460 470 



45 



orf 1. pep 
orfla 



VILDQQADDKGKKQAFSEIGLXSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNGHSLSFH 
480 490 500 510 520 530 



50 



orf 1. pep 
orfla 



RIQ13TDEGAMIXXHNATTTSTVTITGNESITQPSGKNINRLNYSKEIAYNGWFGEKDTTK 
540 550 560 570 580 590 



55 



orf 1 .pep 
orfla 



TNGRLNLVYQPAAEDRTXLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSGWSKMEG 
600 610 620 630 640 650 



60 



65 



70 



orf 1 .pep 
orfla 

orf 1. pep 
orfla 

orf 1. pep 



IPQGEIVWDNDWIXRTFKAENFHIQGGQAVISRNVAKVEGDXHLSNHAQAVFGVAPHQSH 
660 670 680 690 700 710 

440 450 460 470 480 

XXXXXDKVTASLTKTDISGNVDLADHAHLNLTGLATLNGNLSAN 

: 11:11111111111111 hlhlllllll 
TICTRSDWTGLTNCVEXXITDDKVIASLTKTDXSGXVXLXXXXXXXLXGXAXLXGNLSAN 
720 730 740 750 760 770 

490 500 510 520 530 540 

GDTRYTVSHNATQNGNXSLVXNAQATFNQATLNGNTSASGNASFNLSDHAVQNGSLTLSG 
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I I I I I I I I I I I I I I I I ill 11111111111111:1 I I I I I I I I 1 :: I : I I i I I I M 
orfla GDTRYTVSHNATQNGNLSLVGNAQATFNQATLNGNXSXSGNASFNLSNNAAQNGSLTLSD 
780 790 800 810 820 830 

550 560 570 580 590 600 

orf 1 . pep NAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSGXELGNL 
I I I I I I I I I I I I I I I I I I I I I I I I I 1 : I I I II I : I I : I I I I I I I I I I I I I I I I : I I I I I 
orfla NAKANVSHSALNGNVSLADKAVFHFENSRFTGQLSGSKXTALHLKDSEWTLPSGTELGNL 
840 850 860 870 880 890 



610 620 630 640 650 660 

orf 1 . pep NLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLXVTPPTSVESRFNTLTVNG 
I I I I I I I I I I I I I I I I I I I I I 11 :: I : I I I I I I I I II I I I I I I I I I I I I I I I I I I 

orfla NLDNATITLNSAYRHDAAGAQTGXVSDTPRRRSRRS LLSVTPPTSVESRFNTLTVNG 

15 900 910 920 930 940 950 



670 680 690 700 710 720 

orf 1 . pep KLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLTWEGKDNKPL 
III I I I I i I I I I I I I I 1 I I I I I I I I i I I I I I I I 1 I I I I I I I I : I I : I 1 II I I I M I I I I 
20 orfla KLNXQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPVSLDQLTWEGKDNKPL 

960 970 980 990 1000 1010 



730 740 750 

orf 1 . pep S EN LN FTLQNE HVDAGAW 

25 I I I 1 I I I I I I I I I I I I I I 

orfla SENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAEAKKQAEKDNAQS 
1020 1030 1040 1050 1060 1070 



30 orf 1. pep 



orfla LDALIAAGRDAAEKTESVAEPARXAGGENVGIMQAEEEKKRVQADKDSALAKQREAETRP 
1080 1090 1100 1110 1120 1130 

35 760 

orf 1. pep LDR 

I i I 

orfla XTTAFPRARXARRDLPQPQPQPQPQPQPQRDLXSRYANSGLSEFSATLNSVFAVQDELDR 
1140 1150 1160 1170 1180 1190 

40 

770 780 790 800 810 820 

orf 1 . pep VFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFSHNRTEN 
I I C I t I K I I I I I I II I 1 1 I I I I I M I I 11 I I I I ! I t I I I M I M I I I I I II I I I I I I I 
orfla VFAEDRRNAVWTSXIRXTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFSHNRTEN 
45 1200 1210 1220 1230 1240 1250 



50 



830 840 850 860 870 880 

orf 1 . pep T FD DG I GN S ARLAHGAVFGQ YG I DRFY I G I S AGAG FS S G S LS DG I GXKXRRRVLH YG I QA 

: II I I I I I I I I I I I I I I I I I I I I II I I I I : I I I I II I MINI I I I I I I I I I I I I 
orfla XFDDGIGNSARLAHGAVFGQYGIGRFDIGISTGAGFSSGXLSDGIGGKIRRRVLHYGIQA 
1260 1270 1280 1290 1300 1310 



55 



890 900 910 920 930 940 

orf 1 . pep RYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSFKPAQHI 

I I I I I I I I I I I I I : I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I II 
orfla RYRAGFGGFGIEPYIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSFKPAQHX 
1320 1330 1340 1350 1360 1370 



950 960 970 980 990 1000 

60 orf 1 .pep SITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSLHAAAAKGP 

Mill I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I II I I 
orfla SITPYXSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSXHAAAAKGP 
1380 1390 1400 1410 1420 1430 



65 1010 1020 

orf 1 . pep QLE AQH SAG I KLG YRWX 

I I I II I I I I I I I I I I I I 
o rf 1 a QLE AQHS AG I KLG YRWX 

1440 1450 
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1 ATGAAAACAA CCGACAAACG 

51 AACCGGCCGC ATCCGCTTCT 

101 TCGGCATTCT TCCCCAAGCT 

151 TACCAATACT ATCGCGACTT 

201 GGCGAAAGAT ATTGAGGTNT 

251 CAATGACAAA AGCCCCGATG 

301 GTGGCGGCAT TGGTGGGCGA 

351 CGGCTATAAC AACGTTGATT 

401 ACCGTTTTTC TTACCAAATT 

451 TCACACCCTT ACAACGGCGA 

501 CACAGATGCA GAACCTGTCG 

551 ATTCCGATAA AGAAAAATAT 

601 CACTATTGGC GTTATGATGA 

651 CGCATGGTTA ATTGGCGGCA 

701 GCGTANTTAG TTTGAGCGGC 

751 ATGCCGATTG CAGGTGCGGC 

801 TGACAAAACA AACAATAAAT 

851 ACCCTTATTC CGGCAGGGAA 

901 TTCTACGATG ACATTTACAG 

951 GCGCAGTAAC GGACATTTTT 

1001 CGGTAACAGA AACCAACGAA 

1051 ACAGTCCGAC TGTTTGACGA 

1101 TTACGCGGCA GGGGGTGTTA 

1151 AAAACCTTTC TTTTATCGAT 

1201 AACATCAACC AAGGCGCGGG 

1251 CTCGCCTGAA AACAACGAAA 

1301 AAGACAGTAC CGTTACTTGG 

1351 TCCAAAATCG GCAAAGGCAC 

1401 AGGCTCGATC AGCGTGGGCG 

1451 ACGATAAAGG CAAAAAACAA 

1501 AGGGGTACGG TGCAACTGAA 

1551 CTATTTCGGC TTTCGCGGCG 

1601 CGTTCCACCG TATTCAAAAT 

1651 AATGCCACAA CAACATCCAC 

1701 ACAACCGAGT GGTAAGAATA 

1751 CCTACAACGG TTGGTTTGGC 

1801 CTCAACCTTG TTTACCAGCC 

1851 CGGCGGAACA AATTTAAACG 

1901 TTTTCAGCGG CAGACCGACA 

1951 TGGTCAAAAA TGGAAGGTAT 

2001 CTGGATCNAC CGCACGTTTA 

2051 AGGCGGTGAT TTCCCGCAAT 

2101 AGCAATCACG CCCAAGCAGT 

2151 AATCTGTACA CGTTCGGACT 

2201 NCATTACCGA CGATAAAGTG 

2251 GGCANTGTNA GNCTNNCCNA 

2301 NNCACTNAAN GGCAATCTTA 

2351 GCCACAACGC CACCCAAAAC 

2401 GCAACATTTA ATCAAGCCAC 

2451 TGCTTCATTT AATCTAAGCA 

2501 TTTCCGACAA CGCTAAGGCA 

2551 GTCTCCCTAG CCGATAAGGC 

2601 CGGACAACTC AGCGGCAGCA 

2651 AATGGACGCT GCCGTCAGGC 

2701 GCCACCATTA CACTCAATTC 

2751 AACCGGCAGN GTGTCAGACA 

2801 TATCCGTTAC ACCGCCAACT 

2851 GTAAACGGCA AATTGAACNG 

2901 CTTCGGCTAC CGAAGCGACA 

2951 CTTACACCTT GGCGGTCAAC 

3001 CAATTGACGG TAGTGGAAGG 

3051 TAATTTCACC CTGCAAAACG 

3101 AACTCATCCG CAAAGACGGC 

3151 CAAGAGCTTT CCGACAAACT 

3201 AAAAGACAAC GCGCAAAGCC 

3251 CCGCCGAAAA GACAGAAAGC 

3301 GAAAATGTCG GCATTATGCA 

3351 GGATAAAGAC AGCGCNTTGG 

3401 NTACCACCGC CTTCCCCCGC 

3451 CCGCAGCCCC AACCGCAACC 

3501 CCGTTATGCC AATAGCGGTT 

3551 TTTTCGCCGT ACAGGACGAA 



GACAACCGAA ACACACCGCA AAGCCCCGAA 
CGCCTGCTTA CTTAGCCATA TGCCTGTCGT 
TGGGCGGGAC ACACTTATTT CGGCATCAAC 
TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 
ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 
ATTGATTTTT CTGTGGTGTC GCGTAACGGC 
TCAATATATT GTGAGCGTGG CACATAACGG 
TTGGTGCGGA AGGAAGNAAT CCCGATCAGC 
GTGAAAAGAA ATAATTATAA GCCTGACAAT 
TTANCATATG CCGCGTTTGC ATAAATTTGT 
AAATGACGAG TGACATGAGG GGGAATACCT 
CCCGAGCGTG TCCGCATCGG CTCAGGACAC 
TGACAAACAC GGCGATTTAT CCTACTCCGG 
ATACACATAT GCAGGGTTGG GGAAATAATG 
GATGTGCGCC ATGCCAACGA CTATGGCCCT 
AGGCGACAGC GGTTCGCCAA TGTTTATTTA 
GGCTGCTCAA CGGAGTTTTA CAAACCGGCT 
AACGGTTTCC AGCTGATACG CAAAGATTGG 
AGGCGATACA CATACCGTCT NTTTTGAACC 
CCTTTACATC CAACAACAAC GGTACGGGTA 
AAGGTNTCCA ATCCAAAGCT TAAAGTACAG 
ATCTTTGAAT GAAACTGATA AAGAACCAGT 
ATCAGTACCG TCCAAGGTTA AACAACGGTG 
TACGGCAACG GCAAACTCAT CTTATCAAAC 
CGGTTTGTAT TTTGAAGGTG ATTTTACGGT 
CGTGGCAAGG CGCGGGCGTT CATATCAGTG 
AAAGTAAACG GCGTGGCAAA CGACCGCCTG 
GCTGCACGTT CAAGCCAAAG GGGAAAACCA 
ACGGTACAGT CATTTTGGAT CAGCAGGCAG 
GCCTTTAGTG AAATCGGCTT GNTCAGCGGC 
TGCCGATAAT CAGTTCAACC CCGACAAACT 
GACGTTTGGA TTTAAACGGG CATTCGCTTT 
ACCGATGAAG GGGCGATGAT TGNCNATCAT 
CGTTACCATT ACAGGGAATG AAAGTATTAC 
TCAATAGACT TAATTACAGC AAAGAAATTG 
GAGAAAGATA CGACCAAAAC GAACGGGCGG 
CGCCGCAGAA GACCGCACCC NGCTGCTTTC 
GCAACATCAC GCAAACAAAC GGCAAACTGT 
CCGCACGCCT ACAATCATTT AGGAAGCGGG 
CCCACAAGGA GAAATCGTGT GGGACAACGA 
AAGCGGAAAA TTTCCATATT CAGGGCGGGC 
GTTGCCAAAG TGGAAGGCGA TTGNCATTTG 
TTTTGGTGTC GCACCGCATC AAAGCCATAC 
GGACNGGTCT GACAAATTGT GTCGAANAAA 
ATTGCTTCAT TGACTAAGAC NGACNTNAGC 
TNACGNTNNT TNAAANCTCN CNGGGCNTGC 
GTGCAAATGG CGATACACGT TATACAGTCA 
GGCAACCTTA GCCTCGTGGG CAATGCCCAA 
ATTAAACGGC AACNCATCGG NTTCGGGCAA 
ACAACGCCGC ACAAAACGGC AGTCTGACGC 
AACGTAAGCC ATTCCGCACT CAACGGCAAT 
AGTATTCCAT TTTGAAAACA GCCGCTTTAC 
AGGANACAGC ATTACACTTA AAAGACAGCG 
ACGGAATTAG GCAATTTAAA CCTTGACAAC 
CGCCTATCGC CACGATGCTG CAGGCGCGCA 
CGCCGCGCCG CCGTTCGCGC CGTTCCCTAT 
TCGGTAGAAT CCCGTTTCAA CACGCTGACG 
TCAAGGAACA TTCCGCTTTA TGTCGGAACT 
AATTGAAGCT GGCGGAAAGT TCCGAAGGNA 
AATACCGGCA ACGAACCCGT AAGCCTCGAT 
GAAAGACAAC AAACCGCTGT CCGAAAACCT 
AACACGTCGA TGCCGGCGCG TGGCGTTACC 
GAGTTCCGCC TGCATAATCC GGTCAAAGAA 
CGGCAAGGCA GAAGCCAAAA AACAGGCGGA 
TTGACGCGCT GATTGCGGCC GGGCGCGATG 
GTTGCCGAAC CGGCCCGGCN GGCAGGCGGG 
GGCGGAGGAA GAGAAAAAAC GGGTGCAGGC 
CGAAACAGCG CGAAGCGGAA ACCCGGCCGG 
GCCCGCNGCG CCCGCCGGGA TTTGCCGCAA 
TCAACCCCAA CCGCAGCGCG ACCTGATNAG 
TGAGTGAATT TTCCGCCACG CTCAACAGCG 
TTGGACCGCG TGTTTGCCGA AGACCGCCGC 
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3601 AACGCNGTTT GGACAAGCNG CATCCGGNAC ACCAAACACT ACCGTTCGCA 

3651 AGATTTCCGC GCCTACCGCC AACAAACCGA CCTGCGCCAA ATCGGTATGC 

3701 AGAAAAACCT CGGCAGCGGG CGCGTCGGCA TCCTGTTTTC GCACAACCGG 

3751 ACCGAAAACA NCTTCGACGA CGGCATCGGC AACTCGGCAC GGCTTGCCCA 

3801 CGGCGCCGTT TTCGGGCAAT ACGGCATCGG CAGGTTCGAC ATCGGCATCA 

3851 GCACGGGCGC GGGTTTTAGC AGCGGCANTC TNTCAGACGG CATCGGAGGC 

3901 AAAATCCGCC GCCGCGTGCT GCATTACGGC ATTCAGGCAC GATACCGCGC 

3951 CGGTTTCGGC GGATTCGGCA TCGAACCGTA CATCGGCGCA ACGCGCTATT 

4001 TCGTCCAAAA AGCGGATTAC CGCTACGAAA ACGTCAATAT CGCCACCCCC 

4051 GGTCTTGCGT TCAACCGNTA CCGNGCGGGC ATTAAGGCAG ATTATTCATT 

4101 CAAACCGGCG CAACACATNT CCATCACNCC TTATTTNAGC CTGTCCTATA 

4151 CCGATGCCGC TTCGGGCAAA GTCCGAACAC GCGTCAATAC CGCNGTATTG 

4201 GCTCAGGATT TCGGCAAAAC CCGCAGTGCG GAATGGGGCG TAAACGCCGA 

4251 AATCAAAGGT TTCACGCTGT CCNTCCACGC TGCCGCCGCC AAAGGNCCGC 

4301 AACTGGAAGC GCAACACAGC GCGGGCATCA AATTAGGCTA CCGCTGGTAA 

This encodes a protein having amino acid sequence <SEQ ID 652>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 



MKTTDKRTTE 
YQYYRDFAEN 
VAALVGDQYI 
SHPYNGDXHM 
HYWRYDDDKH 
MPIAGAAGDS 
FYDDIYRGDT 
TVRLFDESLN 
NINQGAGGLY 
SKIGKGTLHV 
RGTVQLNADN 
NATTTSTVTI 
LNLVYQPAAE 
WSKMEGIPQG 
SNHAQAVFGV 
GXVXLXXXXX 
ATFNQATLNG 
VSLADKAVFH 
ATITLNSAYR 
VNGKLNXQGT 
QLTWEGKDN 
QELSDKLGKA 
ENVGIMQAEE 
PQPQPQPQPQ 
NAVWTSXIRX 
TENXFDDGIG 
KIRRRVLHYG 
GLAFNRYRAG 
AQDFGKTRSA 



THRKAPKTGR 
KGKFAVGAKD 
VSVAHNGGYN 
PRLHKFVTDA 
GDLSYSGAWL 
GS PMFIYDKT 
HTVXFEPRSN 
ETDKEPVYAA 
FEGDFTVSPE 
QAKGENQGSI 
QFNPDKLYFG 
TGNESITQPS 
DRTXLLSGGT 
EIVWDNDWIX 
APHQSHTICT 
XXLXGXAXLX 
NXSXSGNASF 
FENSRFTGQL 
HDAAGAQTGX 
FRFMSELFGY 
KPLSENLNFT 
EAKKQAEKDN 
EKKRVQADKD 
PQRDLXSRYA 
TKHYRSQDFR 
NSARLAHGAV 
IQARYRAGFG 
IKADYSFKPA 
EWGVNAEIKG 



IRFSPAYLAI CLSFGILPQA 



IEVYNKKGEL 
NVDFGAEGXN 
EPVEMTSDMR 
IGGNTHMQGW 
NNKWLLNGVL 
GHFSFTSNNN 
GGVNQYRPRL 
NNETWQGAGV 
SVGDGTVILD 
FRGGRLDLNG 
GKNINRLNYS 
NLNGNITQTN 
RTFKAENFHI 
RSDWTGLTNC 
GNLSANGDTR 
NLSNNAAQNG 
SGSKXTALHL 
VSDTPRRRSR 
RSDKLKLAES 
LQNEHVDAGA 
AQSLDALIAA 
SALAKQREAE 
NSGLSEFSAT 
AYRQQTDLRQ 
FGQYGIGRFD 
GFGIEPYIGA 
QHXSITPYXS 
FTLSXHAAAA 



VGKSMTKAPM 
PDQHRFSYQI 
GNTYSDKEKY 
GNNGVXSLSG 
QTGYPYSGRE 
GTGTVTETNE 
NNGENLSFID 
HISEDSTVTW 
QQADDKGKKQ 
HSLSFHRIQN 
KEIAYNGWFG 
GKLFFSGRPT 
QGGQAVISRN 
VEXXITDDKV 
YTVSHNATQN 
SLTLSDNAKA 
KDSEWTLPSG 
RSLLSVTPPT 
SEGTYTLAVN 
WRYQLIRKDG 
GRDAAEKTES 
TRPXTTAFPR 
LNSVFAVQDE 
IGMQKNLGSG 
IGISTGAGFS 
TRYFVQKADY 
LSYTDAASGK 
KGPQLEAQHS 



WAGHTYFGIN 
IDFSWSRNG 
VKRNNYKPDN 
PERVRIGSGH 
DVRHANDYGP 
NGFQLIRKDW 
KVSNPKLKVQ 
YGNGKLILSN 
KVNGVANDRL 
AFSEIGLXSG 
TDEGAMIXXH 
EKDTTKTNGR 
PHAYNHLGSG 
VAKVEGDXHL 
IASLTKTDXS 
GNLSLVGNAQ 
NVSHSALNGN 
TELGNLNLDN 
SVESRFNTLT 
NTGNEPVSLD 
EFRLHNPVKE 
VAEPARXAGG 
ARXARRDLPQ 
LDRVFAEDRR 
RVGILFSHNR 
SGXLSDGIGG 
RYENVNIATP 
VRTRVNTAVL 
AGIKLGYRW* 



A transmembrane region is underlined. 



ORF1-1 shows 86.3% identity over a 1462aa overlap with ORFla: 

10 20 30 40 50 60 

or f la . pep MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFG I LPQAWAGHTYFG IN YQYYRDFAEN 

I I I II I I I I I I I I I I I I I I I I I 1 I I I I I I I II I I I I I I I I I I I I I I I I I I I I I i I I I I I I 
orfl-1 MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGI LPQAWAGHTYFG IN YQYYRDFAEN 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f la . pep KGKFAVGAKD I EVYNKKGELVGKSMTKAPM I DFSWSRNGVAALVGDQY I VSVAHNGGYN 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I 
orfl-1 KGKFAVGAKDI EVYNKKGELVGKSMTKAPM I DFSWSRNGVAALVGDQY I VSVAHNGGYN 

70 80 90 100 110 120 

130 140 150 160 170 179 

orf la . pep NVDFGAEGXNPDQHRFSYQIVKRNNYKPDNS-HPYNGDXHMPRLHKFVTDAEPVEMTSDM 
llllllll I I I I I I I : I : I I I I I I I I :: 111:11 I I I I I I I II I I t I I I M I I I 
orfl-1 NVDFGAEGRNPDQHRFTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 

130 140 150 160 170 180 
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10 



15 



900 910 920 930 940 
or f la . pep TELGNLNLDNATITLNSAYRHDAAGAQTGXVSDTPRRRSRRS LLSVTPPTSVESRFN 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I ::|:llllllll I I I I I I I I I I I I I I 1 
orfl-1 TE LGN LNLDN AT I TLNSAYRHD7VAGAQTGS AT DAPRRRSRRSRRS LLSVTPPTSVESRFN 

900 910 920 930 940 950 

950 960 970 980 990 1000 

or f la . pep TLTVNGKLNXQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPVSLDQLTWEG 

MINIM! I M M I M M M I M M M M II II M M I M I I M I I I M I M M M I I 
orfl-1 TLTVNGKLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLTWEG 

960 970 980 990 1000 1010 

1010 1020 1030 1040 1050 1060 

or f la . pep KDNKPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAEAKKQAE 
I I I I II I I I I II I I I I II II I I I I II I I I I I I I I I I I II I I I I I M II I I I M I I I I I II 
orfl-1 KDNKPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAEAKKQAE 

1020 1030 1040 1050 1060 1070 



20 



25 



30 



35 



1070 1080 1090 1100 1110 1120 

or f la . pep KDNAQSLDALIAAGRDAAEKTESVAEPARXAGGENVGIMQAEEEKKRVQADKDSALAKQR 
I M II II II I I II I M I :\ II II I I I I I I II I I II I II I I II II II I II I I I M I M II 
orfl-1 KDNAQSLDALIAAGRDAVEKTESVAEPARQAGGENVGIMQAEEEKKRVQADKDTALAKQR 

1080 1090 1100 1110 1120 1130 

1130 1140 1150 1160 1170 1180 

orf la . pep EAETRPXTTAFPRARXARRDLPQPQPQPQPQPQPQRDLXSRYANSGLSEFSATLNSVFAV 
I I I I II I I I M I I I I I II I I I II II I I I I I II I I I I I I I I I I I I I I I I I 1 I I I I 
orfl-1 EAETRPATTAFPRARRARRDLPQLQPQPQPQP — QRDLISRYANSGLSEFSATLNSVFAV 

1140 1150 1160 1170 1180 1190 

1190 1200 1210 1220 1230 1240 

orf la. pep QDELDRVFAEDRRNAVWTSXIRXTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFS 
I I I I II I II I I I I I I I I I I II I II I I I 1 M M II I II II I I II M I II II II I I I II I 
orfl-1 QDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFS 
1200 1210 1220 1230 1240 1250 



40 



1250 1260 1270 1280 1290 1300 

orf la . pep HNRTENXFDDGIGNSARLAHGAVFGQYGIGRFDIGISTGAGFSSGXLSDGIGGKIRRRVL 
I I I I I I : I I I I I I I I I I I I I I I I I II I I I II I I I I : I II I I II I I I I I II I I I I I I I 
orfl-1 HNRTENTFDDGIGNSARLAHGAVFGQYGIDRFYIGISAGAGFSSGSLSDGIGGKIRRRVL 
1260 1270 1280 1290 1300 1310 



45 



1310 1320 1330 1340 1350 1360 

orf la . pep HYGIQARYRAGFGGFGIEPYIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSF 
I I I I I II II I I I I I II I I I M II I I II I M I II I I I I I I I I I I I I I II i I I I I I I I I M I 
orfl-1 H YG I QARYRAG FGG FG I E PH I G ATRY FVQKAD YRYEN VN I AT PGLAFNRYRAG I KAD Y S F 

1320 1330 1340 1350 1360 1370 



50 



55 



60 



1370 1380 1390 1400 1410 1420 

or f la . pep KPAQHXSITPYXSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSXHA 
Mill I I II I II I I I I I I II I I II II I I I II I II I I II I II I I II II II II II II II 
orfl-1 KPAQHISITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSLHA 
1380 1390 1400 1410 1420 1430 

1430 1440 1450 

orf la. pep . AAAKGPQLEAQHSAGIKLGYRWX 
I II I II M M II II I II I II I II 
orfl-1 AAAKGPQLEAQHSAGIKLGYRWX 
1440 1450 

Homology with adhesion and penetration protein hap precursor of Rinfluenzae (accession number P45387) 
Amino acids 23-423 of ORF1 show 59% aa identity with hap protein in 450aa overlap: 



65 



orfl 23 FXAAYLAI CL S FG I LPQAWAGHTYFG IN YQY YRD FAENKGK FAVGAKD I EV YNKKGELVG 82 
F +L C+S GI QAWAGHT Y FG I + YQY YRD FAENKGK F VGAK+IEVYNK+G+LVG 
6 FRLN FLTACVSLG I AS QAWAGHT Y FG I D YQY YRD FAENKGK FTVGAKN I E VYNKE GQLVG 65 



hap 6 
orfl 83 



hap 



KSMTKAPMIDFSWSRNGVAALVGVQYIVSVAHNGGYNNVDFGAEGXNIXDQXRXTYKIV 142 
SMTKAPMI DFS WSRNGVAALVG QYIVSVAHNGGYN+VDFGAEG N DQ R TY+IV 
66 TSMTKAPMIDFSWSRNGVAALVGDQYIVSVAHNGGYNDVDFGAEGRN-PDQHRFTYQIV 124 
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orfl 


143 : 


5 


hap 


125 : 


orfl 


203 i 

< 




hap 


10 J < 


10 


orfl 


223 




hap 


245 i 


15 


orfl 


278 




hap 


305 i 




orfl 


335 ; 


20 


hap 


364 J 




orfl 


394 i 

i 




hap 


424 i 


25 


Amino acids 7 15-] 




Orfl 


41 i 




hap 


1 

733 1 


30 


orfl 


99 < 




hap 


793 1 


35 


orfl 


159 : 




hap 


853 : 




orfl 


219 ( 


40 


hap 


( 

900 ( 




orfl 


279 : 




hap 


960 : 


45 


Amino acids 1192 




Orfl 


1 




hap 


1135 


50 


orfl 


61 




hap 


1195 


55 


orfl 


121 




hap 


1255 




orfl 


181 


60 


hap 


1315 




orfl 


241 


65 


hap 


1375 



KRNNY+A + HPY GDYHMPRLHK VT+AEPV MT+ MDG+ Y D+ NYP+RVRIG+GR 
KRNNYQAWERKHPYDGDYHMPRLHKFVTEAEPVGMTTNMDGKVYADRENYPERVRIGSGR 184 

QYWRSDEDEPNNRESSYHIA 222 

QYWR+D+DE N SSY+++ 



-SGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGFQLVRKDWFYDEIFAGDTHSVF 277 
SGSPMFIYDA+K++WLIN VLQTG+P+ G+ NGFQL+R++WFY+E+ A DT SVF 



Y P NG YSF +N+GTGK+ + + + + TV+LFN SL++TA+E V A 



A G N Y+PR+ G+NI D+GKG L + +N I NQG AGGLY F+G + F V +NN TWQGA 



GV I +D+TV WKV+ NDRLSKIG GTL 



DTRYTVSHNATQ-NGNXSLVXNAQATFNQ-ATLNGNTSASGNASFNLSDHAVQNGSLTLS 98 
DT+ S TQ NG+ +L NA + A LNGN + ++ F LS++A Q G++ LS 



GNAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSGXELGN 158 
+A A V+++ LNGNV L D A F ++S F QI G KDT + L+++ WT+PS L N 



L L+N+T+TLNSAY + S+ +AP L T PTS E RFNTLTVN 

LTLNNSTVTLNSAY S AS SNNAPRHRRS LETETTPTSAEHRFNTLTVN 899 



GKL+GQGTF+F S LFGY+SDKLKL+ +EG YTL+V NTG EP +LEQLT++E DNKP 



LS+ L FTL+N+HVDAGA 



LDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFSHNR 60 
LDR+F + ++AVWT+ +D + Y S FRAY+Q+T +LRQIG+QK L +GR+G +FSH+R 



TENT FDDGI GNS ARLAHGAVFGQYG I DRFYXXXXXXXXXXXXXXXXXIGXKXRRRVLHYG 120 
++NTFD+ +NAL+FQY KR+ ++YG 



IQARYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSFKPA 180 
+ A Y+ G GI+P+ G RYF+++ +Y+ E V + TP LAFNRY AGI+ DY+F P 



QHI S IT PYLSLSYTDAASGECVRTRVNTAVLAQDFGKTRSAEWGVNAE IKGFTLSLHAAAA 240 
+IS+ PY ++Y D ++ V+T VN VL Q FG+ E G+ AEI F +S + + 



KGPQLEAQHS AG I KLG YRW 259 
+G QL Q + G+KLGYRW 
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Homology with a predicted ORF from N. gonorrhoeae 

The blocks of ORF1 show 83.5%, 88.3%, and 97.7% identities in 467, 298, and 259 aa overlap, 
respectively with a predicted ORF (ORFlng) from N. gonorrhoeae: 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



orf l.pep 
orf lng 
orf l.pep 
orf lng 
orf l.pep 
orf lng 
orf l.pep 
orf lng 
orf l.pep 
orflng 
orf l.pep 
orflng 
orf l.pep 
orflng 
orf l.pep 
orflng 
orf l.pep 
orflng 
orf l.pep 
orflng 
orf l.pep 
orflng 
orf l.pep 
orflng 
orf l.pep 
orflng 
orf l.pep 
orflng 
orf 1 .pep 
orflng 
orf 1 .pep 
orflng 



MKTTDKRTTETHRKAPKTGRIRFXAAYLAICLSFGILPQAWAGHTYFGINYQYYRDFAEN 
I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I ! I I I I I I I I I I I I I 1 I I I I I 
MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQARAGHTYFGINYQYYRDFAEN 



MDGRKY I DQNN YP DRVRIGAGRQ YWRS DE DEPNNRE S S YHI AS 

Mi II I I : II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 

MDGWKYADLNKYPDRVRIGAGRQYWRS DEDE PNNRE S S YHIAS AYS WLVGGNT FAQNGSG 



VQLFNVS LSETARE PVYHAAGGVNS YRPRLNNGEN I S FI DEGKGELI LTSN INQGAGGLY 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I II I I I I I I 
VQL FN VS L SE TARE PVYHAAGGVNS YRPRLNNGEN I SFIDKGKGE LI LTSN INQGAGGLY 

FQGDFTVSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGT 
I : I : I I I I I : I I II I I I I I I I I I : I I I I I I I I I II I I I I I I I I I I I 
FEGNFTVSPKNNETWQGAGVHISDGSTVTWKVNGVANDRLSKIGKGTLLVQAKGENQGSV 

// 

DKVT AS LTKT D I SGNVDLADHAHLNLTGLA 
III I I I : I I I : 111:1111111111111 
FGVAPHQSHTICTRSDWTGLTSCTEKTITDDKVIASLSKTDVRGNVSLADHAHLNLTGLA 



LPSGXELGNLNLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLXVTPPTSVE 
I I I I : I I I I I I I I I I I i I I I I I I I I M I I I I I I I I : II I I II I I I i II 111111:1 
LPSGTELGNLNLDNATITLNSAYRHDAAGAQTGSAADAPRRRSRRS LLSVTPPTSAE 



60 



60 



120 



KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGVQYIVSVAHNGGYN 
I I I I I II I I I M I I I I I I I II I I I I I II I I I I I I I I I I M I I I I : I I I I I I I I I I I I I I 
KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALAGDQYIVSVAHNGGYN 120 

NVDFGAEGXNIXDQXRXTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKXVTDAEPVEMTSY 180 
I I I I I I I I I II I : I : I I I I I II I I I I : I I I I I I I II I II I I I I I I I I I I I I I 11 
NVDFGAEGSN-PDQHRFSYQIVKRNNYKAGTNGHPYGGDYHMPRLHKFVTDAEPVEMTSY 179 



223 



239 



255 



GS PMFI YDA QKQKWLIN GVLOTGNPYIGKSNG 

I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
GGTVNLGSEKIKHSPY GFLPTGGSFGDSGS PMFI YDA QKQKWLIN GVLOTGNPYIGKSNG 289 

FOLVRKDWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRT 315 
I I I I I I I I I I II I I I I I I I II I I I I : I I I II I ! 1:1 It: M 1:1 11:1 Ml Mill I 
FOLVRKDWFYDEIFAGDTHSVFYEPHQNGKYFFNDNNNGAGKIDAKHKHYSLPYRLKTRT 359 



375 



422 



479 



744 



774 



803 



TLNGNLSANGDTR-YTVSHNATQNGNXSLVXNAQATFNQATLNGNTSASGNASFNLSDHA 
1:1111 ::::M : lllllll III II I I I I I I I II I I II I I I I I I 1 I I I :: I 
TFNGNL-VQAETRTIRLRANATQNGNLSLVGNAQATFNQATLNGNTSASDNASFNLSNNA 833 

VQNGSLTLSGNAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWT 863 
MM Ml II I II I I I I I II I II I I I I I I I I M II I : I I II I M II I M I I M I M i I I I 
VQNGSLTLSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGKISGGKDTALHLKDSEWT 893 



923 



950 



983 



SRFNT LT VNGKLNGQGT FR FMSELFG YRS DKLKLAE S SE GT YTLAVNNTGNE PAS LEQLT 
I I I I I I I II I II II I II II I I I I M 11 II I I I II I II I II I II II I I I I I II : I M M I 
SRFNTLTVNGKLNGQGTFRFMSELFGYRSGKLKLAESSEGTYTLAVNNTGNEPVS LEQLT 1010 

WEGKDNKPLSENLNFTLQNEHVDAGAW 1011 
lllllll I I I II M I I I I I II 1 I I I II 

WEGKDNTPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAGET 1070 

// 

LDRVFAEDRRNAVWTSGIRDTKHYRSQDFR 1211 
II I I 1 I It I I I I M I II II M I I M I II I I 
PQRDLISRYANSGLSEFSATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFR 1239 

AYRQQTDLRQIGMQKNLGSGRVGILFSHNRTENTFDDGIGNSARLAHGAVFGQYGIDRFY 1271 

I M I I I I II I I I I M I II II II I II I I II II I II I I II M M I II I I I 1 I II I I I II 
AYRQQTDLRQIGMQKNLGSGRVGILFSHNRTGNTFDDGIGNSARLAHGAVFGQYGIGRFD 1299 
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orf 1 . pep IGISAGAGFSSGSLSDGIGXKXRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADY 1331 

I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I ! I I I I I I I I I I 
orflng IGISAGAGFSSGSLSDGIRGKIRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADY 1359 

orf 1. pep RYENVNIATPGLAFNRYRAGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVL 1391 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 1 I I I I I I I I 
orflng RYENVNIATPGLAFNRYRAGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVL 1419 

orf 1 .pep AQDFGKTRS AEWGVNAE I KG FT L S LHAAAAKG PQLE AQHS AG I KLG YRW 1440 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orflng AQDFGKTRS AEWGVNAE I KG FT LS LHAAAAKG PQLEAQHS AG I KLG YRW 1468 

The complete length ORFlng nucleotide sequence was identified <SEQ ID 653>: 

1 ATGAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCTAA 

51 AACCGGCCGC ATCCGCTTCT CGCCCGCTTA CTTAGCCATA TGCCTGTCGT 

101 TCGGCATTCT GCCCCAAGCC CGGGCGGGAC ACACTTATTT CGGCATCAAC 

151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 

201 GGCGAAAGAT ATTGAGGTTT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 

251 CGATGACGAA AGCCCCGATG ATTGATTTTT CTGTGGTATC GCGTAACGGC 

301 GTGGCGGCAT TGGCGGGCGA TCAATATATT GTGAGCGTGG CACATAACGG 

351 CGGCTATAAC AATGTTGATT TTGGTGCGGA GGGAAGCAAT CCCGATCAGC 

401 ACCGCTTTTC TTACCAAATT GTGAAAAGAA ATAATTATAA AGCAGGGACT 

451 AACGGCCATC CTTATGGCGG CGATTATCAT ATGCCGCGTT TGCACAAATT 

501 TGTCACAGAT GCAGAACCTG TTGAGATGAC CAGTTATATG GATGGGTGGA 

551 AATACGCTGA TTTAAATAAA TACCCTGATC GTGTTCGAAT CGGAGCAGGC 

601 AGACAATATT GGCGGTCTGA TGAAGACGAA CCCAATAACC GCGAAAGTTC 

651 ATATCATATT GCAAGCGCAT ATTCTTGGCT CGTCGGTGGC AATACCTTTG 

701 CACAAAATGG ATCAGGTGGT GGCACAGTCA ACTTAGGTAG CGAAAAAATT 

751 AAACATAGCC CATATGGTTT TTTACCAACA GG AGGCT CAT TTGGCGACAG 

801 TGGCTCACCA ATGTTTATCT ATGATGCCCA AAAGCAAAAG TGGTTAATTA 

851 ATGGGGTATT GCAAACAGGC AACCCCTATA TAGGAAAAAG CAATGGCTTC 

901 CAGCTAGTTC GTAAAGATTG GTTCTATGAT GAAATCTTTG CTGGAGATAC 

951 CCATTCAGTA TTCTACGAAC CACATCAAAA TGGGAAATAC TTTTTTAACG 

1001 ACAATAATAA TGGCGCAGGA AAAATCGATG CCAAACATAA ACACTATTCT 

1051 CTACCTTATA GATTAAAAAC ACGAACCGTT CAATTGTTTA ATGTTTCTTT 

1101 ATCCGAGACA GCAAGAGAAC CTGTTTATCA TGCTGCAGGT GGGGTCAACA 

1151 GTTATCGACC CAGACTGAAT AATGGAGAAA ATATTTCCTT TATTGACAAA 

1201 GGAAAAGGTG AATTGATACT TACCAGCAAC ATCAACCAAG GCGCGGGCGG 

1251 TTTGTATTTT GAGGGTAATT TTACGGTCTC GCCTAAAAAC AACGAAACGT 

1301 GGCAAGGCGC GGGCGTTCAT ATCAGTGATG GCAGTACCGT TACTTGGAAA 

1351 GTAAACGGCG TGGCAAACGA CCGCCTGTCC AAAATCGGCA AAGGCACGCT 

1401 GCTGGTTCAA GCCAAAGGGG AAAACCAAGG CTCGGTCAGC GTGGGCGACG 

1451 GTAAAGTCAT CTTAGATCAG CAGGCGGACG ATCAAGGCAA AAAACAAGCC 

1501 TTTAGTGAAA TCGGCTTGGT CAGCGGCAGG GGGACGGTGC AACTGAATGC 

1551 CGATAATCAG TTCAACCCCG ACAAACTCTA TTTCGGCTTT CGCGGCGGAC 

1601 GTTTGGATTT GAACGGGCAT TCGCTTTCGT TCCACCGCAT TCAAAATACC 

1651 GATGAAGGGG CGATGATTGT CAACCACAAT CAAGACAAAG AATCCACCGT 

1701 TACCATTACA GGCAATAAAG ATATTACTAC AACCGGCAAT AACAACAACT 

1751 TGGATAGCAA AAAAGAAATT GCCTACAACG GTTGGTTTGG CGAGAAAGAT 

1801 GCAACCAAAA CGAACGGGCG GCTCAATCTG AATTACCAAC CGGAAGAAGC 

1851 GGATCGCACT TTACTGCTTT CCGGCGGAAC AAATTTAAAC GGCAATATCA 

1901 CGCAAACAAA CGGCAAACTG TTTTTCAGCG GCAGACCGAC ACCGCACGCC 

1951 TACAATCATT TAGGAAGCGG GTGGTCAAAA ATGGAAGGTA TCCCACAAGG 

2001 AGAAATCGTG TGGGACAACG ATTGGATCGA CCGCACATTT AAAGCGGAAA 

2051 ACTTCCATAT TCAGGGCGGA CAAGCGGTGG TTTCCCGCAA TGTTGCCAAA 

2101 GTGGAAGGCG ATTGGCATTT AAGCAATCAC GCCCAAGCAG TTTTCGGTGT 

2151 CGCACCGCAT CAAAGCCACA CAATCTGTAC ACGTTCGGAC TGGACGGGTC 

2201 TGACAAGTTG TACCGAAAAA ACCATTACCG ACGATAAAGT GATTGCTTCA 

2251 TTGAGCAAGA CCGACATCAG AGGCAATGTC AGCCTTGCCG ATCACGCTCA 

2301 TTTAAATCTC ACAGGACTTG CCACACTCAA CGGCAATCTT AGTGCAGGCG 

2351 GAGACACGCA CTATACGGTT ACGCGCAACG CCACCCAAAA CGGCAACCTC 

2401 AGCCTCGTGG GCAATGCCCA AGCAACATTT AATCAAGCCA CATTAAACGG 

2451 CAACACATCG GCTTCGGACA ATGCTTCATT TAATCTAAGC AACAACGCCG 

2501 TACAAAACGG CAGTCTGACG CTTTCCGACA ACGCTAAGGC AAACGTAAGC 

2551 CATTCCGCAC TCAACGGCAA TGTCTCCCTA GCCGATAAGG CAGTATTCCA 

2601 TTTTGAAAAC AGCCGCTTTA CCGGAAAAAT CAGCGGCGGC AAGGATACGG 

2651 CATTACACTT AAAAGACAGC GAATGGACGC TGCCGTCGGG CACGGAATTA 

2701 GGCAATTTAA ACCTTGACAA CGCCACCATT ACACTCAATT CCGCCTATCG 

2751 ACACGATGCG GCAGGCGCGC AAACCGGCAG TGCGGCAGAT GCGCCGCGCC 

2801 GCCGTTCGCG CCGTTCCCTA TTATCCGTTA CGCCGCCAAC TTCGGCAGAA 

2851 TCCCGTTTCA ACACGCTGAC GGTAAACGGC AAATTGAACG GTCAGGGAAC 
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2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 
3351 
3401 
3451 
3501 
3551 
3601 
3651 
3701 
3751 
3801 
3851 
3901 
3951 
4001 
4051 
4101 
4151 
4201 
4251 
4301 
4351 
4401 



ATTCCGCTTT 
TGGCGGAAAG 
AACGAACCCG 
CACACCGCTG 
atgccggcgc 
CTGCATAATC 
gggagaaACA 
AAcaacaggc 
gCcgggcgca 
GCAGGCAGGC 
AACGGGTGCA 
GAAACCCGGC 
GGATTTGCCG 
TGATCAGCCG 
AACAGCGTTT 
CCGCCGCAAC 
GTTCGCAAGA 
GGTATGCAGA 
CAACCGGACC 
TTGCCCACGG 
GGCATCAGCG 
CAGAGGCAAA 
ACCGCGCAGG 
CGCTATTTCG 
CACCCCGGGC 
ATTCATTCAA 
TCCTATACCG 
CGTATTGGCG 
ACGCCGAAAT 
GGGCCGCAAT 
CTGGTAA 



ATGTCGGAAC 
TTCCGAAGGC 
TAAGTCTCGA 
TCCGAAAATC 
atggCGTTAT 
CGGTCAAAGA 
GAggccgccT 
ggaaaAAGAC 
atgccaccga 
GGGG AAAAtg 
GGCGGATAAA 
CGGCTACCAC 
CAACCGCAGC 
TTATGCCAAT 
TCGCCGTACA 
GCCGTTTGGA 
TTTCCGCGCC 
AAAACCTCGG 
GGAAACACCT 
TGCCGTTTTC 
CGGGCGCGGG 
ATCCGCCGCC 
TTTCGGCGGA 
TCCAAAAAGC 
CTTGCATTCA 
ACCGGCGCAA 
ATGCCGCTTC 
CAGGATTTCG 
CAAAGGTTTC 
TGGAAGCGCA 



TCTTCGGCTA 
ACTTACACCT 
GCAATTGACG 
TTAATTTCAC 
CAGCTTATCC 
ACAAGAGCTT 
TGACGGCAAA 
AACgcgcaaa 
AAAGGCAgaa 
ccgGCATTAT 
GACACCGCCT 
CGCCTTCCCC 
CCCAACCGCA 
AGCGGTTTGA 
GGACGAATTG 
CAAGCGGCAT 
TACCGCCAAC 
CAGCGGGCGC 
TCGACGACGG 
GGGCAATACG 
TTTTAGTAGC 
GCGTGCTGCA 
TTCGGCATCG 
GGATTACCGA 
ACCGCTACCG 
CACATTTCCA 
CGGCAAAGTC 
GCAAAACCCG 
ACGCTGTCCC 
GCACAGCGCG 



CCGCAGCGGC 
TGGCTGTCAA 
GTAGTGGAAG 
CCTGCaaaAc 
gcaaagacgG 
TCCGACAAAC 
ACAGGCacaA 
gccttgAcgc 
agtgttgccg 
GCAGGCGGAG 
TGGCGAAACA 
CGCGCCCGCC 
ACCCCAACCG 
GTGAATTTTC 
GACCGCGTGT 
CCGGGACACC 
AAACCGACCT 
GTCGGCATCC 
CATCGGCAAC 
GCATCGGCAG 
GGCAGCCTTT 
TTACGGCATT 
AACCGCACAT 
TACGAAAACG 
CGCGGGCATT 
TCACGCCTTA 
CGAACGCGCG 
CAGTGCGGAA 
TCCACGCTGC 
GGCATCAAAT 



AAATTGAAGC 
CAATACCGGC 
GAAAAGACAA 
gaacacgtcg 
CGAGTTCCgc 
TCGGCAAGgc 
CTTGCCGCCA 
gctgattgcg 
aaccgGCCCG 
GAAGAGAAAA 
GCGCGAAGCG 
GCGCCCGCCG 
CAGCGCGACC 
CGCCACGCTC 
TTGCCGAAGA 
AAACACTACC 
GCGCCAAATC 
TGTTTTCGCA 
TCGGCACGGC 
GTTCGACATC 
CAGACGGCAT 
CAGGCAAGAT 
CGGCGCAACG 
TCAATATCGC 
AAGGCAGATT 
TTTGAGCCTG 
TCAATACCGC 
TGGGGCGTAA 
CGCCGCCAAG 
TAGGCTACCG 



This is predicted to encode a protein having amino acid sequence <SEQ ID 654>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 



MKTTDKRTTE 
YQYYRDFAEN 
VAALAGDQYI 
NGHPYGGDYH 
RQYWRSDEDE 
KHSPYGFLPT 



THRKAPKTGR 
KGKFAVGAKD 
VSVAHNGGYN 
MPRLHKFVTD 
PNNRESSYHI 
GGSFGDSGSP 



IRFSPAYLAI 
IEVYNKKGEL 
NVDFGAEGSN 
AEPVEMTSYM 
ASAYSWLVGG 
MFIYDAQ KQK 



CLSFGILPQA 
VGKSMTKAPM 
PDQHRFSYQI 
DGWKYADLNK 
NTFAQNGSGG 
WLINGVLOTG 



RAGHTYFGIN 
IDFSWSRNG 
VKRNNYKAGT 
YPDRVRIGAG 
GTVNLGSEKI 
NPYIGKSNGF 



gLVRKDWFYD 
LPYRLKTRTV 
GKGELILTSN 
VNGVANDRLS 
FSEIGLVSGR 
DEGAMIVNHN 
ATKTNGGLNL 
YNHLGSGWSK 
VEGDWHLSNH 
LSKTDVRGNV 
SLVGNAQATF 
HSALNGNVSL 
GNLNLDNATI 
SRFNTLTVNG 
NEPVSLEQLT 
LHNPVKEQEL 
AGRNATEKAE 
ETRPATTAFP 
NSVFAVQDEL 
GMQKNLGSGR 
GISAGAGFSS 
RYFVQKADYR 
SYTDAASGKV 
GPQLEAQHSA 



EIFAGDTHSV 
QLFNVSLSET 
INQGAGGLYF 
KIGKGTLLVQ 
GTVQLNADNQ 
QDKESTVTIT 
NYPPEEADRT 
MEGIPQGEIV 
AQAVFGVAPH 
SLADHAHLNL 
NQATLNGNTS 
ADKAVFHFEN 
TLNSAYRHDA 
KLNGQGTFRF 
WEGKDNTPL 
SDKLGKAGET 
SVAEPARQAG 
RARRARRDLP 
DRVFAEDRRN 
VGILFSHNRT 
GSLSDGIRGK 
YENVNIATPG 
RTRVNTAVLA 
GIKLGYRW* 



FYEPHQNGKY 
AREPVYHAAG 
EGNFTVSPKN 
AKGENQGSVS 
FNPDKLYFGF 
GNKDITTTGN 
LLLSGGTNLN 
WDNDWIDRTF 
QSHTICTRSD 
TGLATFNGNL 
ASDNASFNLS 
SRFTGKISGG 
AGAQTGSAAD 
MSELFGYRSG 
SENLNFTLQN 
EAALTAKQAQ 
GENAGIMQAE 
QPQPQPQPQP 
AVWTSGIRDT 
GNTFDDGIGN 
IRRRVLHYGI 
LAFNRYRAGI 
QDFGKTRSAE 



FFNDNNNGAG 
GVNSYRPRLN 
NETWQGAGVH 
VGDGKVILDQ 
RGGRLDLNGH 
NNNLDSKKEI 
GNITQTNGKL 
KAENFHIQGG 
WTGLTSCTEK 
VQAETRTIRL 
NNAVQNGSLT 
KDTALHLKDS 
APRRRSRRSL 
KLKLAESSEG 
EHVDAGAWRY 
LAAKQQAEKD 
EEKKRVQADK 
QRDLISRYAN 
KHYRSQDFRA 
SARLAHGAVF 
QARYRAGFGG 
KADYSFKPAQ 
WGVNAEIKGF 



KIDAKHKHYS 
NGENISFIDK 
ISDGSTVTWK 
QADDQGKKQA 
SLSFHRIQNT 
AYNGWFGEKD 
FFSGRPTPHA 
QAWSRNVAK 
TITDDKVIAS 
RANATQNGNL 
LSDNAKANVS 
EWTLPSGTEL 
LSVTPPTSAE 
TYTLAVNNTG 
QLIRKDGEFR 
NAQSLDALIA 
DTALAKQREA 
SGLSEFSATL 
YRQQTDLRQI 
GQYGIGRFDI 
FGIEPHIGAT 
HISITPYLSL 
TLSLHAAAAK 



Underlined and double-underlined sequences represent the active site of a serine protease (trypsin 
family) and an ATP/GTP-binding site motif A (P-loop). 



ORF1-1 and ORFlng show 93.7% identity in 1471 aa overlap: 
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10 20 30 40 50 60 

MKTT DKRTTETHRKAPKTGR I RFS PAYLAICLS FGI LPQAWAGHT YFG IN YQYYRDFAEN 

I I I I | | I I I I I I I I I I I I I I i I I I I I I I I I I I I I ! I I I I I MIMiillllllilllll 

MKTT DKRTTETHRKAPKTGRIRFS PAYLAICLS FGI LPQARAGHTYFGINYQYYRDFAEN 
10 20 30 40 50 60 

70 80 90 100 110 120 

KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGDQYIVSVAHNGGYN 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I M I I I I I I :| I I I I I I I I I I t I M 
KGKFAVG AKD I EVYNKKGE LVGKSMTKAPM I D FS WS RNG VAALAG DQ Y I V S VAHNGG YN 
70 80 90 100 110 120 

130 140 150 160 170 180 

NVDFGAEGRNPDQHRFTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 

I I I 1 | M I I I I I I I I : I : I I I I I I I I I II : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
NVDFGAEGSNPDQHRFSYQIVKRNNYKAGTNGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 
130 140 150 160 170 180 

190 200 210 220 230 240 

DGRKYIDQNNYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSGG 

Mill I : I II M I I I M I I I I I I I I I I I I I I M I I M I I I I I II II II I I I I I I I II 
DGWKYADLNKYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSGG 
190 200 210 220 230 240 

250 260 270 280 290 300 

GTVNLGSEKIKHSPYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGF 

1 IE I I I I I I M I M I I I I I M I I I I I I I I I I I I I I I I I I M I M M I I I M II I I I I I I I 
GTVNLGSEKIKHSPYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGF 
250 260 270 280 290 300 

310 320 330 340 350 360 

QLVRKDWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRTV 

MINIMI! MIMIhllMI lll:IN:IM:lll:l Ml I 

QLVRKDWFYDEIFAGDTHSVFYEPHQNGKYFFNDNNNGAGKIDAKHKHYSLPYRLKTRTV 
310 320 330 340 350 360 

370 380 390 400 410 420 

QLFNVSLSETAREPVYHAAGGVNSYRPRLNNGENISFIDEGKGELILTSNINQGAGGLYF 

I II II II I I I I I I I I I M I I II I I I II II I I I I I II I M M I I I I I I I I I I I I I I M M I 
QLFNVS LSETARE PVYHAAGGVNS YRPRLNNGEN I S FI DKGKGELI LTSN INQGAGGLYF 
370 380 390 400 410 420 

430 440 450 460 470 480 

QGDFTVSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTLHVQAKGENQGSIS 
M: I I II Ml I III I II M M I: II I I I I I I I I I M 1 I I I I I I I I I IIIMIIII|:| 
EGNFTVSPKNNETWQGAGVHISDGSTVTWKVNGVANDRLSKIGKGTLLVQAKGENQGSVS 

430 440 450 460 470 480 

490 500 510 520 530 540 

VGDGTVILDQQADDKGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNGH 
MM I I I II I I I I : I I I I I I II I I I I M I I I I II I I I I I M I i M M M I I I I I M I M 
VGDGKVILDQQADDQGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNGH 

490 500 510 520 530 540 

550 560 570 580 590 600 

SLSFHRIQNTDEGAMIVNHNQDKESTVTITGNKDIATTGNNNSLDSKKEIAYNGWFGEKD 

I I I I I II I I I II I I I I I I I I I I I I I I I I II I I II I :\ I I I II : I II I I I II II II I I I I I 
SLSFHRIQNTDEGAMIVNHNQDKESTVTITGNKDITTTGNNNNLDSKKEIAYNGWFGEKD 
550 560 570 580 590 600 

610 620 630 640 650 660 

TTKTNGRLNLVYQPAAEDRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLNDHWSQ 
: I I I I I I I M Ml MMIMIIIMIillllllllllllllllllllMI:: II: 
ATKTNGRLNLNYQPEEADRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSGWSK 

610 620 630 640 650 660 

670 680 690 700 710 720 

KEGIPRGEIVWDNDWINRTFKAENFQIKGGQAWSRNVAKVKGDWHLSNHAQAVFGVAPH 

II I |:l M I M II I 1:11 I II M I: MM I I II I I M M Ml I II I I M I I I I II II I I 
MEGIPQGEIVWDNDWIDRTFKAENFHIQGGQAWSRNVAKVEGDWHLSNHAQAVFGVAPH 
670 680 690 700 710 720 



WO 99/24578 



PCT/IB98/01665 



-373- 



10 



15 



20 



25 



30 



35 



40 



730 740 750 760 770 780 

orf 1-1 . pep QSHTICTRSDWTGLTNCVEKTITDDKVIASLTKTDISGNVDLADHAHLNLTGLATLNGNL 
I I I I I 1 i I I I I I I I I : I : I I I I I I I I I I I I I : I I I I I I I : It I I I I I I I I I I I I I I I I I 
orflng-1 QSHTICTRSDWTGLTSCTEKTITDDKVIASLSKTDIRGNVSLADHAHLNLTGLATLNGNL 

730 740 750 760 770 780 

790 800 810 820 830 840 

orf 1-1 . pep SANGDTRYTVSHNATQNGNLSLVGNAQATFNQATLNGNTSASGNASFNLSDHAVQNGSLT 

I I : I I I : I I I : : I I II I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I II :: I M I I I I I 
orflng-1 SAGGDTHYTVTRNATQNGNLSLVGNAQATFNQATLNGNTSASDNASFNLSNNAVQNGSLT 

790 800 810 820 830 840 

850 860 870 880 890 900 

orf 1-1 . pep LSGNAKANVSHS ALNGNVSLADKAVFH FES SRFTGQI SGGKDTALHLKDSEWTLPSGTEL 

II I I I I I I II I I I I I I I I I I I I I I I I I I : I I I I I : I I i I I I I I I I I I I I I 1 I I I I I I I I 
orflng-1 L S DN AKANVS H S ALNGN VSLADKAVFH FEN SRFTGK I SGGKDTALHLKDSEWTLPSGTEL 

850 860 870 880 890 900 

910 920 930 940 950 960 

or f 1-1 . pep GNLNLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLSVTPPTSVESRFNTLT 
I I I I I I I I ! I I I I I I I I I 1 I I I I I I I I I : I II I I I I I I I I I I I I I I I I : I I I I I I I I 
orflng-1 GNLNLDNATITLNSAYRHDAAGAQTGSAADAPRRRSR — rRSLLSVTPPTSAESRFNTLT 

910 920 930 940 950 

970 980 990 1000 1010 1020 

orf 1-1 . pep VNGKLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLTVVEGKDN 

I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I : I I I I I I I I I I I I I 
orflng-1 VNGKLNGQGTFRFMSELFGYRSGKLKLAESSEGTYTLAVNNTGNEPVSLEQLTVVEGKDN 

960 970 980 990 1000 1010 

1030 1040 1050 1060 1070 

orf 1-1 . pep KPLSENLN FTLQNEHVDAGAWRYQL I RKDGE FRLHN PVKEQELS DKLGKA 

II I I I I I I I I I I I I I I I I I I II II I I M I I I I II I I I I I I I I I I I II I I 
orflng-1 TPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAGETEAALTAK 
1020 ' 1030 1040 1050 1060 1070 

1080 1090 1100 1110 1120 

orf 1-1 . pep EAKKQAEKDNAQSLDALIAAGRDAVEKTESVAEPARQAGGENVGIMQAEEEKKRVQ 

II : I I I I I I II I I I I I II I I I : I : I I : I I I I II I I I I I I I I : I I I I I I I I II I I I 
orflng-1 QAQLAAKQQAEKDNAQSLDALIAAGRNATEKAESVAEPARQAGGENAGIMQAEEEKKRVQ 
1080 1090 1100 1110 1120 1130 



45 



50 



55 



60 



65 



70 



1130 1140 1150 1160 1170 1180 

orf 1-1 . pep ADKDTALAKQREAETRPATTAFPRARRARRDLPQLQPQPQPQPQRDLISRYANSGLSEFS 
I I I II I I I I I I I I II I I I M I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I II I I 
orflng-1 ADKDTALAKQREAETRPATTAFPRARRARRDLPQPQPQPQPQPQRDLISRYANSGLSEFS 
1140 1150 1160 1170 1180 1190 

1190 1200 1210 1220 1230 1240 

orf 1-1. pep AT LN S VFAVQDE L DRV FAE DRRN AVWT SGI RDTKH YRS Q D FRAYRQQT DLRQ I GMQKN LG 

I I I I I I I I I I I I I I I II I I I I I I I I I I I II M II I I I I I I I I I I I I I I I I I I I II II I I I 
orflng-1 ATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLG 
1200 1210 1220 1230 1240 1250 

1250 1260 1270 1280 1290 1300 

orf 1-1 . pep SGRVGILFSHNRTENTFDDGIGNSARLAHGAVFGQYGIDRFYIGISAGAGFSSGSLSDGI 
I I I I I I I II I I I I I I I I I I I I I II I I I I I I I II I II I II I I I I I I 1 I I I I I I I I I I I 
orflng-1 SGRVGILFSHNRTGNTFDDGIGNSARLAHGAVFGQYGIGRFDIGISAGAGFSSGSLSDGI 
1260 1270 1280 1290 1300 1310 

1310 1320 1330 1340 1350 1360 

orf 1-1 . pep GGK I RRRVLHYG I QAR YRAG FGG FG I E PH I GAT RY FVQKAD YRYEN VN I AT PGLAFNRYR 

I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I II I I I I I I I I I M I M I I I I I II I I 
orflng-1 RGKIRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYR 
1320 1330 1340 1350 1360 1370 

1370 1380 1390 1400 1410 1420 

orf 1-1 . pep AGIKADYS FKPAQHI S IT PYLS LS YT DAASGKVRTRVNT AVLAQDFGKTRS AEWGVNAE I 

I I I I I I I I I I I I II I M I 1 I I I I II I I I I I I I I I M I I I I I I I I I I I I I I II I I II I I I I 
orflng-1 AGIKADYS FKPAQHI SITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRS AEWGVNAE I 

1380 1390 1400 1410 1420 1430 
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1430 1440 1450 

orf 1-1 . pep KG FT L S LHAAAAKG PQLE AQH SAG I KLG YRWX 

I I I I I I I I I I I M I II I I I I I I M I I I I I I I I 
orflng-1 KG FT LS LHAAAAKG PQLEAQHSAG IKLG YRWX 

5 1440 1450 1460 



In addition, ORFlng shows 55.7% identity with hap protein (P45387) over a 1455aa overlap: 

SCORES Initl: 1104 Initn: 4632 Opt: 2680 

Smith-Waterman score: 5165; 55.7% identity in 1455 aa overlap - 

10 

10 20 30 40 50 60 

MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQARAGHTYFGINYQYYRDFAEN 

I :|: |:|:ll: II I I I I I I I I : I I I I I I I I I I 
MKKTVFRLNFLTACI SLGI VSQAWAGHTYFGI DYQYYRDFAEN 
10 20 30 40 

70 80 90 100 110 120 

KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALAGDQYIVSVAHNGGYN 
M I |: I I I:: I: II I !:|:| I I I I I I I I I I I I I I I I I I I I I I I : :llll IIMI II: 
KGKFTVGAQNIKVYNKQGQLVGTSMTKAPMIDFSWSRNGVAALVENQYIVSVAHNVGYT 
50 60 70 80 90 100 

130 140 150 160 170 180 

NVDFGAEGSNPDQHRFSYQIVKRNNYKAGTNGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 
:| III 111:11 11111:1:1 lllllll I III I I I I I I I I I M : I I :: I I I I 
DVDFGAEGNNPDQHRFTYKIVKRNNYKKD-NLHPYEDDYHNPRLHKFVTEAAPIDMTSNM 
110 120 130 140 150 160 

190 200 210 220 230 240 

DGWKYADLNKYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSGG 

: I I : I : t I I : I I 1 1 I : I I I : I I : I : I : : : : I : I I : I : : I I I I I : I : 

NGSTYSDRTKYPERVRIGSGRQFWRNDQDKGD QVAGAYHYLTAGNTHNQRGAGN 

170 180 190 200 210 

250 260 270 280 290 300 

GTVNLGSEKIKHSPYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGF 

I II:: I : I I I I : I I I I I I I I I I I I I I : I I I I I I I I : I : I I I : I I I I I 
GYSYLGGDVRKAGEYGPLPIAGSKGDSGSPMFIYDAEKQKWLINGILREGNPFEGKENGF 

220 230 240 250 260 270 

310 320 330 340 350 360 

QLVRKDWFYDEIFAGDTHSVFYEPHQNGKYFFNDNNNGAGKIDAKHKHYSLPYRLKTRTV 

II III::! MM I I: M II I :: MM Ml I ::l ::l : 
QLVRKSYF-DEIFERDLHTSLYTRAGNGVYTISGNDNGQGSITQKS GIPSEIK I 

280 290 300 310 320 

370 380 390 400 410 419 

QLFNVSLSETAREPVYHAA-GGVNSYRPRLNNGENISFIDKGKGELILTSNINQGAGGLY 

11:11 :: M: I I I M I I I I I : : I : M : I I I : : I : M I M I I 1 I 
TLANMSLPLKEKDKVHNPRYDGPNIYSPRLNNGETLYFMDQKQGSLIFASDINQGAGGLY 
330 340 350 360 370 380 

420 430 440 450 460 470 479 

orflng-1 . pep FEGNFTVSPKNNETWQGAGVHISDGSTVTWKVNGVANDRLSKIGKGTLLVQAKGENQGSV 
55 " * I M I I I I I I :: M 1 I I I I I : h I :: I M 1 M I I I I : I I I I I 1 I I I I I I I I I I I I : I I : 

p45387 FEGNFTVSPNSNQTWQGAGIHVSENSTVTWKVNGVEHDRLSKIGKGTLHVQAKGENKGSI 
390 400 410 420 430 440 

480 490 500 510 520 530 539 

60 orflng-1 . pep SVGDGKVILDQQADDQGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNG 

[ I I I II I I I : I I I I I I I : I I II I I II I I I I II II I I I MIM I M I I I II I II II I I I 
p45387 SVGDGKVILEQQADDQGNKQAFSEIGLVSGRGTVQLNDDKQFDTDKFYFGFRGGRLDLNG 
450 460 470 480 490 500 

65 540 550 560 570 580 590 

orflng-1. pep HSLSFHRIQNTDEGAMIVNHNQDKESTVTITGNKDITT-TGNN-NNLDSKKEIAYNGWFG 
| I I : M I I I I II I I I I I I I I I : : : I I II I M : M : I II I : II : I I I I I I I I I I 
p45387 HSLT FKRIQNTDEGAMI VNHNTTQAANVT ITGNE S I VLPNGNNINKLDYRKE I AYNGWFG 

510 520 530 540 550 560 

70 



15 



25 



35 



40 



45 



orflng-1 .pep 
p45387 



orflng-1 .pep 
20 p45387 



orf lng-1 .pep 
p45387 



30 orflng-1. pep 

p45387 



orf lng-1 .pep 
P45387 

orf lng-1 .pep 
p45387 



orf lng-1. pep 
50 p45387 
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600 610 620 630 640 650 

orf lng-1 . pep EKDATKTNGRLNLNYQPEEADRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSG 

I I : I I I I I I I 1:1 I I I I I I I : I : I I i I : I I I I I I I I I ! II:: 

p45387 ETDKNKHNGRLNLIYKPTTEDRTLLLSGGTNLKGDITQTKGKLFFSGRPTPHAYNHLNKR 
5 * 570 580 590 600 610 620 

660 670 680 690 700 710 

orf lng-1 . pep WSKMEGIPQGEIWDNDWIDRTFKAENFHIQGGQAWSRNVAKVEGDWHLSNHAQAVFGV 
I I : I I I I I M I I I I I : I I I : I I I I I I 1 I : I : I I : I I I I I I I ::: I I : I : I I : I : I : I I I 
10 p45387 WSEMEGIPQGEIVWDHDWINRTFKAENFQIKGGSAWSRNVSSIEGNWTVSNNANATFGV 

630 640 650 660 670 680 

720 730 740 750 760 770 

orf lng-1 , pep APHQS HT I CTRS DWTGLT SCTEKT I T DDKVI AS LSKTD I RGN VS LADHAHLNLTGLATLN 
15 3 : I : I : : I I I I I I I I I I I I : I : s|I III Is llsl |sss|s|s| I : I I I II 

. p45387 VPNQQNTICTRSDWTGLTTCQKVDLTDTKVINSIPKTQINGSINLTDNATANVKGLAKLN 
690 700 710 720 730 740 

780 790 800 810 820 830 

20 orf lng-1 . pep GNLSAGGDTHYTVTRNATQNGNLSLVGNAQATFNQATLNGNTSASDNASFNLSNNAVQNG 

II:: : : : : : I : I I I I I : I I 

p45387 GNVTL TNHSQFTLSNNATQIG 

750 760 770 

25 840 850 860 870 880 890 

orf lng-1 . pep SLTLSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGKISGGKDTALHLKDSEWTLPSG 

: : MM: | : | : : : Mill I : I : I I : : I I : I : : I : I I I : : I : : : II : II 
p45387 NIRLSDNSTATVDNANLNGNVHLTDSAQFSLKNSHFSHQIQGDKGTTVTLENATWTMPSD 

780 790 800 810 82Q 830 

30 

900 910 920 930 940 950 

orf lng-1 . pep TELGNLNLDNATITLNSAYRHDAAGAQTGSAADAPRRRSRRSLLSVTPPTSAESRFNTLT 
I | | | : I : I : 1 I M I II I : : I : : : I I I I I I : I I I I I I I I I I II 

p4 5387 TTLQNLTLNNSTITLNSAY S AS SNNT PRRRS LETETTPTSAEHRFNTLT 

35 840 850 860 870 

960 970 980 990 1000 1010 

orf lng-1 . pep VNGKLNGQGTFRFMSELFGYRSGKLKLAESSEGTYTLAVNNTGNEPVSLEQLTWEGKDN 
I I I I I : I I I I I : I I 1111:1 I II I : : : : II I Ml I I M I I : I I I I I : I I : I I I 
40 p45387 VNGKLSGQGTFQFTSSLFGYKSDKLKLSNDAEGDYILSVRNTGKEPETLEQLTLVESKDN 

880 890 900 910 920 930 

1020 1030 1040 1050 1060 1070 

orf lng-1 . pep TPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAGETEAALTAK 
45 1 I I I : : I : II I : I : I I II I I I I : I : : : I II I I I M I : I II I I : I : I : : I : I I I 

p45387 QPL SDKLKFTLENDHVDAGALRYKLVKN DGE FRLHN P I KEQELHN DLVRAEQAERT LEAK 

940 950 960 970 980 990 

1080 1090 1100 1110 1120 1130 

50 orf lng-1 .pep QAQLAAKQQAEKDNAQSLDALIAAGRNAT-EKAESVAEPARQAGGENAGIMQAEEEKKRV 

I ... I I I ... : . I | || . . . * • | |:|| : | : : : : : | : I 
p45387 QVE PTAKTQTGE PKVRSRRAARAAFPDTLPDQSLLNALE AKQAE - LTAETQKSKAKTKKV 

1000 1010 1020 1030 1040 1050 

55 1140 1150 1160 1170 1180 1190 

orf lng-1 . pep QADK DTALAKQREAETRPATTAFPRARRARRD-LPQPQPQPQPQPQRDLISRYANSG 

: : : : | I : I : : : : : : : I I I : : I : I : I I I I I I : I I : 

p4 5 3 8 7 RSKRAVFS DPLLDQS LFALEAALE VI DAPQQSEKDRLAQEEAEKQ-RKQKDLI SRYSNSA 

1060 1070 1080 1090 1100 1110 

60 

1200 1210 1220 1230 1240 1250 

orf lng-1 . pep LSEFSATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQ-TDLRQIG 
I II : I I M I I : : : I II II M : M : : : : II I M : I : : I M I I I M II I : II I I I 
p45387 LSELSATVNSMLSVQDELDRLFVDQAQSAVWTNIAQDKRRYDSDAFRAYQQQKTNLRQIG 
65 1120 1130 1140 1150 1160 1170 

1260 1270 1280 1290 1300 1310 

orf lng-1 . pep MQKNLGSGRVGILFSHNRTGNTFDDGIGNSARLAHGAVFGQYGIGRFDIGISAGAGFSSG 
: I I I : : I M I : II I : I : M II : : I I M : I : I I I : : : I : : : I : I : I : : 
70 p45387 VQKALANGRIGAVFSHSRSDNTFDEQVKNHATLTMMSGFAQYQWGDLQFGVNVGTGISAS 

1180 1190 1200 1210 1220 1230 
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orflng-l.pep 
p45387 

orflng-l.pep 
p45387 



orflng-l.pep 
p45387 



1320 1330 1340 1350 1360 1370 

S L S DG I RGKI RRR VLH YG I QARYRAG FGG FG I E PH I GATRY FVQKAD YRYENVN I AT PG L 
.... | | . | .... | | .. | | . . | . | | . | . . | . . | | | . . . . | . | . | . | j ; | 

KMAEEQSRKIHRKAINYGVNASYQFRLGQLGIQPYFGVNRYFIERENYQSEEVRVKTPSL 
1240 1250 1260 1270 1280 1290 

1380 1390 1400 1410 1420 1430 

AFNRYRAGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEW 

Mill lll::M:| |:::||: II: : : I : I : : : : : I : I II :|| I II: : I 
AFNRYNAGIRVDYTFTPTDNISVKPYFFVNYVDVSNANVQTTVNLTVLQQPFGRYWQKEV 
1300 1310 1320 1330 1340 1350 

1440 1450 1460 1469 

G VN AE I KG FT LS LHAAAAKG PQLE AQH SAG I KLG YRWX 
|::IM 1:1 : ::| II |:::|:IMIII 
GLKAE ILHFQI S AFI SKSQGSQLGKQQNVGVKLGYRW 
1360 1370 1380 1390 



Based on this analysis, it is predicted that these proteins from N. meningitidis and A ^gonorrhoeae \ 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 78 

The following partial DNA sequence was identified in K meningitidis <SEQ ID 655>: 

1 ..AAGGTGTGGC AATTTGTCGA AGA . CCGCTG CGTGCCGTCG TGCCTGCCGA 

51 CAGTTTTGAA CCGACCGCGC AAAAATTGAA CCTGTTTAAG GCGGGTGCGG 

101 CAACCATTTT GTTTTATGAA GATCAAAATG TCGTCAAAGG TTTGCAGGAG 

151 CAGTTCCCTG CTTATGCCGC TAACTTCCCC GTTTGGGCGg ATCAGGCAAA 

201 CGCGATGGTG CAGTATGCCG TTTGGACGAC ACTTGCCGCG GTCGGCGTAG 

251 GTGCAAACCT GCAACATTAC AATCCCTTGC CCGATGCGGC GATTGCCAAA 

301 GCGTGGAATA TCCCCGAAAA CTGGTTGTTG CGCGCACAAA TGGTTATCGG 

351 CGGTATTGAA GGGGCGGCAG GTGAAAAGAC CTTTGAACCC GTTGCAGAAC 

401 GTTTGAAAGT GTTCGGCGCA TAA 

This corresponds to the amino acid sequence <SEQ ID 656; ORF6>: 

1 . .KVWQFVEXPL RAWPADSFE PTAQKLNLFK AGAATILFYE DQNWKGLQE 
51 QFPAYAANFP WJADQANAMV QYAVWTTLAA VGVGANLQHY NPLPDAAIAK 
101 AWNIPENWLL RAQMVIGGIE GAAGEKTFEP VAERLKVFGA * 

Further sequence analysis revealed a further partial DNA sequence <SEQ ID 657>: 

1 . . CTGCGTGCCG TCGTGCCTGC CGACAGTTTT GAACCGACCG CGCAAAAATT 

51 GAACCTGTTT AAGGCGGGTG CGGCAACCAT TTTGTTTTAT GAAGATCAAA 

101 ATGTCGTCAA AGGTTTGCAG GAGCAGTTCC CTGCTTATGC CGCTAACTTC 

151 CCCGTTTGGG CGGATCAGGC AAACGCGATG GTGCAGTATG CCGTTTGGAC 

201 GACACTTGCC GCGGTCGGCG TAGGTGCAAA CCTGCAACAT TACAATCCCT 

251 TGCCCGATGC GGCGATTGCC AAAGCGTGGA ATATCCCCGA AAACTGGTTG 

301 TTGCGCGCAC AAATGGTTAT CGGCGGTATT GAAGGGGCGG CAGGTGAAAA 

351 GACCTTTGAA CCCGTTGCAG AACGTTTGAA AGTGTTCGGC GCATAA 

This corresponds to the amino acid sequence <SEQ ID 658; ORF6-l>: 

1 . . LRAWPADSF EPTAQKLNLF KAGAATILFY EDQNWKGLQ EQFPAYAANF 
51 PVWADQANAM VQYAVWTTLA AVGVGANLQH YNPLPDAAIA KAWNIPENWL 
101 LRAQMV I GG I EGAAGEKTFE PVAERLKVFG A* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF6 shows 98.6% identity over a 140aa overlap with an ORF (ORF6a) from strain A of TV. 
meningitidis: 



10 



20 30 



WO 99/24578 



-377- 



PCT/IB98/01665 



KVWQFVEXPLRAWPADSFEPTAQKLNLFK 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
QI VEHAVLHT PSS FNSQSARWVLFGEEHDKVWQFVEDALRAVVPADS FEPTAQKLNLFK 
40 50 60 70 80 90 

40 50 60 70 80 90 

AGAATILFYEDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHY 

I I I I I I I I I I I I I I I I I I I I I I I I I I I M I II I I I I I I I 1 I I I I I I I I I II 1 1 I I I 1 1 1 I 
AGAATILFYEDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHY 
100 110 120 130 140 150 

100 110 120 130 140 

NPLPDAAIAKAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGAX 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
NPLPDAAIAKAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGAX 
160 170 180 190 200 

The complete length ORF6a nucleotide sequence <SEQ ID 659> is: 

1 ATGACCCGTC AATCTCTGCA ACAGGCTGCC GAAAGCCGCC GTTCCATTTA 

51 TTCGTTAAAT AAAAATCTGC CCGTCGGCAA AGATGAAATC GTCCAAATCG 

101 TCGAACACGC CGTTTTGCAC ACACCTTCTT CGTTCAATTC CCAATCTGCC 

151 CGTGTGGTCG TGCTGTTTGG CGAAGAGCAT GATAAGGTGT GGCAATTTGT 

201 CGAAGACGCG CTGCGTGCCG TCGTGCCTGC CGACAGTTTT GAACCGACCG 

251 CGCAAAAATT GAACCTGTTT AAGGCGGGTG CGGCAACTAT TTTGTTTTAT 

301 GAAGATCAAA ATGTCGTCAA AGGTTTGCAG GAGCAGTTCC CTGCTTATGC 

351 CGCCAACTTT CCCGTTTGGG CGGACCAGGC GAACGCGATG GTGCAGTATG 

401 CCGTTTGGAC GACACTTGCC GCGGTCGGCG TAGGTGCAAA CCTGCAACAT 

451 TACAATCCCT TGCCCGATGC GGCGATTGCC AAAGCGTGGA ATATCCCCGA 

501 AAACTGGTTG TTGCGCGCAC AAATGGTTAT CGGCGGTATT GAAGGGGCGG 

551 CAGGTGAAAA GACCTTTGAA CCAGTTGCAG AACGTTTGAA AGTGTTCGGC 

601 GCATAA 

This is predicted to encode a protein having amino acid sequence <SEQ ID 660>: 

1 MTRQSLQQAA ESRRSIYSLN KNLPVGKDEI VQIVEHAVLH TPSSFNSQSA 

51 RWVLFGEEH DKVWQFVEDA LRAWPADSF EPTAQKLNLF KAGAATILFY 

101 EDQNWKGLQ EQFPAYAANF PVWADQANAM VQYAVWTTLA AVGVGANLQH 

151 YNPLPDAAIA KAWNIPENWL LRAQMVIGGI EGAAGEKTFE PVAERLKVFG 

201 A* 



orf 6. pep 
orf 6a 

orf 6. pep 
orf 6a 

orf 6. pep 
orf 6a 



ORF6a and ORF6-1 show 100.0% identity in 131 aa overlap: 

50 60 70 80 90 100 

or f 6a . pep TPSSFNSQSARVWLFGEEHDKVWQFVEDALRAWPADSFEPTAQKLNLFKAGAATILFY 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 6-1 LRA W PAD S FE PT AQKLN LFKAGAAT I L FY 

10 20 30 

110 120 130 140 150 160 

orf 6a . pep EDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQH YNPLPDAAIA 
II III II II till! I II I I Mill III II IIMIM II I I i Mi I II I I IMIIMMII 
orf 6-1 EDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQH YNPLPDAAIA 

40 50 60 70 80 90 

170 180 190 200 

orf 6a . pep KAWN I PENWLLRAQMV IGG IEGAAGEKT FE P VAERLKVFGAX 
I I I I M I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 6-1 KAWN I PENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGAX 

100 110 120 130 

Homology with a predicted ORF from N gonorrhoeae 

ORF6 shows 95.7% identity over a 140aa overlap with a predicted ORF (ORF6ng) from 



N. gonorrhoeae: 
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orf6 pep KVWQFVEXPLRAWPADSFEPTAQKLNLFK 30 

II Ml II I II I II II MM II l-lhll I 
orf6ng SNVSLDMSNPTVLRMGLPLYIASLRRGAIYKWQFVEDALRAVVPADSFEPTAQKLKLFK 64 

5 orf6 pep AGAAT I L FYEDQNWKGLQEQFPAYAAN FPVWADQAN AMVQYAVWTTLAAVGVGANLQHY 90 

M I I I I I M M M I M I I M I I II I II I I M I M I I I M I I I II I I I I II I I M II M I I 
o r f 6ng AGAAT I L FYEDQNWKGLQEQFPAYAAN FPVWADQAN AMVQ YAVWTT LAAVGAGAN LQHY 124 

orf6 pep N PLPDAAI AKAWN I PENWLLRAQMVI GG IEGAAGEKTFE PVAERLKV FGA 140 

10 I I I M : M ! II II I II I i I I I I I I I II II I I II I I I M I II I II II ! I i ! 

orf6ng N PL PDVAI AKAWN I PENWLLRAQMVI GGIEGAAGEKVFEPVAERLKVFGA 174 

The complete length ORF6ng nucleotide sequence <SEQ ID 66 1> was identified as: 

1 ATGGCCGTTG CGTCAAATGT CAGCTTGGAT ATGTCCAATC CTACGGTGTT 

15 51 ACGCATGGGA TTACCCTTAT ATATTGCGTC CCTAAGAAGG GGCGCAATAT 

101 ATAAGGTGTG GCAATTTGTC GAAGACGCGC TGCGTGCCGT CGTGCCTGCC 

151 GACAGTTTTG AACCGACCGC GCAAAAATTG AAGCTGTTTA AGGCGGGCGC 

201 GGCAACCATT TTGTTTTATG AAGATCAAAA TGTCGTCAAA GGTTTGCAGG 

251 AGCAGTTCCC TGCTTATGCC GCCAACTTTC CCGTTTGGGC GGACCAGGCG 

20 301 AACGCTATGG TACAGTATGC CGTCTGGACG ACACTTGCCG CGGTCGGTGC 

351 AGGTGCAAAT CTGCAACATT ACAACCCCTT GCCCGATGTG GCGATTGCTA 

4 01 AAGCGTGGAA TATTCCCGAA AACTGGCTGT TGCGCGCGCA AATGGTTATC 

451 GGTGGTATTG AAGGGGcggc aggtgaaaaa gtctttgaac CCGTTGCgga 

501 acgtttgAAA GTGTTCGGCG CATAA 

25 This encodes a protein having amino acid sequence <SEQ ID 662>: 

1 MAVASNVSLD MSNPTVLRMG LPLYIASLRR GAIYKVWQFV EDALRAWPA 
51 DSFEPTAQKL KLFKAGAATI LFYEDQNWK GLQEQFPAYA ANFPVWADQA 
101 NAMVQYAVWT T LAAVGAGAN LQHYNPLPDV AIAKAWNIPE NWLLRAQMVI 
151 GGIEGAAGEK VFEPVAERLK VFGA* 

30 

ORF6ng and ORF6-1 show 96.9% identity in 131 aa overlap: 

10 20 30 

orf6-l pep LRAWPADSFEPTAQKLNLFKAGAATILFY 

I I I II M II M II I II I : I II I I I M I M I 
35 orf6ng PTVLRMGLPLYIASLRRGAIYKVWQFVEDALRAWPADSFEPTAQKLKLFKAGAATILFY 

20 30 40 50 60 70 

40 50 60 70 80 90 

or f 6-1 . pep EDQNWKGLQEQFPAYAAN FPVWADQAN AMVQYAVWTTLAAVGVGANLQHYN PLPDAAI A 
40 I I I I I I I I I I I II I II M I M II I I I II II I I II I I M I II II : II I I M I I I I I I : I I I 

orf6ng EDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGAGANLQHYNPLPDVAIA 
80 90 100 110 120 130 

100 110 120 130 

45 orf 6-1 . pep KAWN I PENWLLRAQMVIGG IEGAAGEKTFE PVAERLKV FGAX 

MM MM III MM MINIM Ml 1:1 I I II MIM III I 
orf6ng KAWN I PENWLLRAQMVIGG I EGAAGEKVFE PVAERLKV FGAX 

140 150 160 170 

50 It is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 79 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 663> 

1 . .GGCTACAACT ACCTGTTCGC GCGCGGCAGC CGCATCGCCA ACTACCAAAT 

55 51 CAACGGCATC CCCGTTGCCG ACGCGCTGGC CGATACGGGt CAATGCCAAC 

101 ACCGCCGCCT ATGAGCGCGT AGAAGTCGTG CGCGGCGTGG CGGGGCTGCT 

151 GGACGGCACG GGCGAGCCTT CCGCCACCGT CAATCTGGTG CGCAAACGCC 

201 TGACCCGCAA GCCATTGTTT GAAGTCCGCG CCGAAGCgGG CAACCGcAAA 
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251 CATTTCGGGC TGGACGCGGA CGTATCGGGC AGCCTGAACA CCGAAG.crC 

301 rCTGCGCgGC CGCCTGGTTT CCAcCTTCGG ACGCGGCGAC TCGTGGCGGC 

351 GGCGCGAACG CAGCCGskAT GCCGAACTCT ACGGCATTTT GGAATACGAC 

401 ATCGCACCGC AAACCCGCGT CCACGCArGC ATGGACTACC AGCAGGCGAA 

451 AGAAACCGCC GACGCGCCGC TCAGcTACGC CGTGTACGAC AGCCAAGGTT 

501 ATGCCACCGC CTTCGGCCCG AAAGACAACC CCGCCACAAA TTGGGCGAAC 

551 AGCCACCACC GTGCGCTCAA CCTGTTCGCC GGCATCGAAC ACCGCTTCAA 

601 CCAAGACTGG AAACTCAAAG CCGAATACGA CTAC. . 

This corresponds to the amino acid sequence <SEQ ED 664; ORF23>: 

1 . . GYNYLFARGS RIANYQINGI PVADALADTG NANTAAYERV EWRGVAGLL 

51 DGTGEPSATV NLVRKRLTRK PLFEVRAEAG NRKHFGLDAD VSGSLNTEXX 

101 LRGRLVSTFG RGDSWRRRER SRXAELYGIL EYDIAPQTRV HAXMDYQQAK 

151 ETADAPLSYA VYDSQGYATA FGPKDNPATN WANSHHRALN LFAGIEHRFN 

201 QDWKLKAEYD Y. . 

Further work revealed the complete nucleotide sequence <SEQ ID 665>: 

1 ATGACACGCT TCAAATATTC CCTGCTGTTT GCCGCCCTGT TGCCCGTGTA 

51 CGCGCAGGCC GATGTTTCTG TTTCAGACGA CCCCAAACCG CAGGAAAGCA 

101 CTGAATTGCC GACCATCACC GTTACCGCCG ACCGCACCGC GAGTTCCAAC 

151 GACGGCTACA CTGTTTCCGG CACGCACACC CCGCTCGGGC TGCCCATGAC 

201 CCTGCGCGAA ATCCCGCAGA GCGTCAGCGT CATCACATCG CAACAAATGC 

251 GCGACCAAAA CATCAAAACG CTCGACCGCG CCCTGTTGCA GGCGACCGGC 

301 ACCAGCCGCC AGATTTACGG CTCCGACCGC GCGGGCTACA ACTACCTGTT 

351 CGCGCGCGGC AGCCGCATCG CCAACTACCA AATCAACGGC ATCCCCGTTG 

401 CCGACGCGCT GGCCGATACG GGCAATGCCA ACACCGCCGC CTATGAGCGC 

451 GTAGAAGTCG TGCGCGGCGT GGCGGGGCTG CTGGACGGCA CGGGCGAGCC 

501 TTCCGCCACC GTCAATCTGG TGCGCAAACG CCTGACCCGC AAGCCATTGT 

551 TTGAAGTCCG CGCCGAAGCG GGCAACCGCA AACATTTCGG GCTGGACGCG 

601 GACGTATCGG GCAGCCTGAA CACCGAAGGC ACGCTGCGCG GCCGCCTGGT 

651 TTCCACCTTC GGACGCGGCG ACTCGTGGCG GCGGCGCGAA CGCAGCCGCG 

701 ATGCCGAACT CTACGGCATT TTGGAATACG ACATCGCACC GCAAACCCGC 

751 GTCCACGCAG GCATGGACTA CCAGCAGGCG AAAGAAACCG CCGACGCGCC 

801 GCTCAGCTAC GCCGTGTACG ACAGCCAAGG TTATGCCACC GCCTTCGGCC 

851 CGAAAGACAA CCCCGCCACA AATTGGGCGA ACAGCCGCCA CCGTGCGCTC 

901 AACCTGTTCG CCGGCATCGA ACACCGCTTC AACCAAGACT GGAAACTCAA 

951 AGCCGAATAC GACTACACCC GCAGCCGCTT CCGCCAGCCC TACGGCGTAG 

1001 CAGGCGTGCT TTCCATCGAC CACAACACCG CCGCCACCGA CCTGATTCCC 

1051 GGTTATTGGC ACGCCGACCC GCGCACCCAC AGCGCCAGCG TGTCATTGAT 

1 1101 CGGCAAATAC CGCCTGTTCG GCCGCGAACA CGATTTAATC GCGGGTATCA 

1151 ACGGTTACAA ATACGCCAGC AACAAATACG GCGAACGCAG CATCATCCCC 

1201 AACGCCATTC CCAACGCCTA CGAATTTTCC CGCACGGGTG CCTACCCGCA 

1251 GCCTGCATCG TTTGCCCAAA CCATCCCGCA ATACGGCACC AGGCGGCAAA 

1301 TCGGCGGCTA TCTCGCCACC CGTTTCCGCG CCGCCGACAA CCTTTCGCTG 

1351 ATTTTGGGCG GACGATACAC CCGTTACCGC ACCGGCAGCT ACGACAGCCG 

1401 CACACAAGGC ATGACCTATG TGTCCGCCAA CCGTTTCACC CCCTACACAG 

1451 GCATCGTGTT CGACCTGACC GGCAACCTGT CTCTTTACGG CTCGTACAGC 

1501 AGCCTGTTCG TCCCGCAATC GCAAAAAGAC GAACACGGCA GCTACCTGAA 

1551 ACCCGTAACC GGCAACAATC TGGAAGCCGG CATCAAAGGC GAATGGCTTG 

1601 AAGGCCGTCT GAACGCATCC GCCGCCGTGT ACCGCGCCCG TAAAAACAAC 

1651 CTCGCCACCG CAGCAGGACG CGACCCGAGC GGCAACACCT ACTACCGCGC 

1701 CGCCAACCAA GCCAAAACCC ACGGCTGGGA AATCGAAGTC GGCGGCCGCA 

1751 TCACGCCCGA ATGGCAGATA CAGGCAGGTT ACAGCCAAAG CAAAACCCGC 

1801 GACCAAGACG GCAGCCGCCT GAACCCCGAC AGCGTACCCG AACGCAGCTT 

1851 CAAACTCTTC ACTGCCTACC ACTTTGCCCC CGAAGCCCCC AGCGGCTGGA 

1901 CCATCGGCGC AGGCGTGCGC TGGCAGAGCG AAACCCACAC CGACCCTGCC 

1951 ACGCTCCGCA TCCCCAACCC CGCCGCCAAA GCCCGCGCCG CCGACAACAG 

2001 CCGCCAAAAA GCCTACGCCG TCGCCGACAT CATGGCGCGT TACCGCTTCA 

2051 ATCCGCGCGC CGAACTGTCG CTGAACGTGG ACAATCTGTT CAACAAACAC 

2101 TACCGCACCC AGCCCGACCG CCACAGCTAC GGCGCACTGC GGACAGTGAA 

2151 CGCGGCGTTT ACCTATCGGT TTAAATAA 

This corresponds to the amino acid sequence <SEQ ID 666; ORF23-l>: 

1 MTRFKYSLLF AALLPVYAQA DVSVSDDPKP QESTELPTIT VTADRTASSN 

51 DGYTVSGTHT PLGLPMTLRE IPQSVSVITS QQMRDQNIKT LDRALLQATG 

101 TSRQIYGSDR AGYNYLFARG SRIANYQING IPVADALADT GNANTAAYER 

151 VEWRGVAGL LDGTGEPSAT VNLVRKRLTR KPLFEVRAEA GNRKHFGLDA 

201 DVSGSLNTEG TLRGRLVSTF GRGDSWRRRE RSRDAELYGI LEYDIAPQTR 
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10 



251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



VHAGMDYQQA 
NLFAGIEHRF 
GYWHADPRTH 
NAIPNAYEFS 
ILGGRYTRYR 
SLFVPQSQKD 
LATAAGRDPS 
DQDGSRLNPD 
TLRIPNPAAK 
YRTQPDRHSY 



KETADAPLSY 
NQDWKLKAEY 
SASVSLIGKY 
RTGAYPQPAS 
TGSYDSRTQG 
EHGSYLKPVT 
GNTYYRAANQ 
SVPERSFKLF 
ARAADNSRQK 
GALRTVNAAF 



AVYDSQGYAT 
DYTRSRFRQP 
RLFGREHDLI 
FAQTIPQYGT 
MTYVSANRFT 
GNNLEAGIKG 
AKTHGWEIEV 
TAYHFAPEAP 
AYAVADIMAR 
TYRFK* 



AFGPKDNPAT 
YGVAGVLSID 
AGING YKYAS 
RRQIGGYLAT 
PYTGIVFDLT 
EWLEGRLNAS 
GGRITPEWQI 
SGWTIGAGVR 
YRFNPRAELS 



NWANSRHRAL 
HNTAATDLIP 
NKYGERSIIP 
RFRAADNLSL 
GNLSLYGSYS 
AAVYRARKNN 
QAGYSQSKTR 
WQSETHTDPA 
LNVDNLFNKH 



Computer analysis of this amino acid sequence gave the following results: 

Homology with the ferric-pseudobactin receptor PupB of Pseudomonas vutida (accession number P38047) 
ORF23 and PupB protein show 32% aa identity in 205aa overlap: 



15 



20 



25 



30 



35 



40 



45 



50 



55 



0rf23 6 FARGSRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLDGTGEPSATVNLVRK 65 

++RG I NY+++G+P + L D + + A ++RVE+VRG GL+ G G PSAT+NL+RK 
PupB 215 WSRGFAIQNYEVDGVPTSTRL-DNYSQSMAMFDRVEIVRGATGLISGMGNPSATINLIRK 273 

0rf23 66 RLTRKPLFEVRAEAGNRKHFGLDADVSGSLNTEXXLRGRLVSTFXXXXXXXXXXXXXXAE 125 

R T + + EAGN +G DVSG L +RGR V+ + 

PupB 274 RPTAEAQASITGEAGNWDRYGTGFDVSGPLTETGNIRGRFVADYKTEKAWIDRYNQQSQL 333 

Orf23 126 LYGILEYDIAPQTRVHAXMDYQQAKETADAPLSYAVYD — SQGYATAFGPKDNPATNWAN 183 

+YGI E+D++ T + Y + D+PL + S G T N A +W+ 

PupB 334 MYGITEFDLSEDTLLTVGFSY — LRSDIDSPLRSGLPTRFSTGERTNLKRSLNAAPDWSY 391 

Orf23 184 SHHRALNLFAGIEHRFNQDWKLKAE 208 

+ H +FIE+ WKE 
PupB 392 NDHEQTSFFTSIEQQLGNGWSGKIE 416 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF23 shows 95.7% identity over a 21 laa overlap with an ORF (ORF23a) from strain A of N. 
meningitidis: 

10 20 30 

or f 2 3. pep GYNYLFARGSRIANYQINGI PVADALADTG 

I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
orf23a QMRDQN I KALDRALLQATGTSRQIYGSDRAGYNYLFARGSRIANYQING I PVADALADTG 

90 100 110 120 130 140 

40 50 60 70 80 90 

orf 23 . pep NANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRLTRKPLFEVRAEAGNRKHFGLDAD 
I I I I I I I I I I I I I! M 1 I I I I [ I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I II II 
orf 23a NANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRPTRKPLFEVRAEAGNRKHFGLGAD 
150 160 170 180 190 200 

100 110 120 130 140 150 

orf 2 3. pep VSGSLNTEXXLRGRLVSTFGRGDSWRRRERSRXAELYGILEYDIAPQTRVHAXMDYQQAK 
I j I I I I : I : I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I II I i I 
orf 23a VSGSLNAEGTLRGRLVSTFGRGDSWRQRERSRDAELYGILEYDIAPQTRVHAGMDYQQAK 
210 220 230 240 250 260 

160 170 180 190 200 210 

orf 2 3. pep ETADAPLSYAVYDSQGYATAFGPKDNPATNWANSHHRALNLFAGIEHRFNQDWKLKAEYD 

I I II I I I I I ! II I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I II I I I I I I I M I I 
orf 23a ETADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRALNLFAGIEHRFNQDWKLKAEYD 
270 280 290 300 310 320 



orf 23. pep Y 
I 

60 orf 23a YTRSRFRQPYGVAGVLSIDHNTAATDLIPGYWHADPRTHSASVSLIGKYRLFGREHDLIA 

330 340 350 360 370 380 

The complete length ORF23a nucleotide sequence <SEQ ID 667> is: 
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1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 



ATGACACGCT 
CGCGCAGGCC 
CTGAATTGCC 
GACGGCTACA 
CCTGCGCGAA 
GCGACCAAAA 
ACCAGCCGCC 
CGCGCGCGGC 
CCGACGCGCT 
GTAGAAGTCG 
TTCCGCCACC 
TTGAAGTCCG 
GACGTATCGG 
TTCCACCTTC 
ATGCCGAACT 
GTCCACGCAG 
GCTCAGCTAC 
CGAAAGACAA 
AACCTGTTCG 
AGCCGAATAC 
CAGGCGTGCT 
GGTTATTGGC 
CGGCAAATAC 
ACGGTTACAA 
AACGCCATTC 
GCCTGCATCG 
TCGGCGGCTA 
ATACTCGGCG 
CACACAAGGC 
GCATCGTGTT 
AGCCTGTTCG 
ACCCGTAACC 
AAGGCCGTCT 
CTCGCCACCG 
CGCCAACCAA 
TCACGCCCGA 
GACCAAGACG 
CAAACTCTTC 
CCATCGGCGC 
ACGCTCCGCA 
CCGCCAAAAA 
ATCCGCGCGC 
TACCGCACCC 
CGCGGCGTTT 



TCAAATATTC 
GATGTTTCTG 
GACCATCACC 
CTGTTTCCGG 
ATCCCGCAGA 
CATCAAAGCG 
AGATTTACGG 
AGCCGCATCG 
GGCCGATACG 
TGCGCGGCGT 
GTCAATCTGG 
CGCCGAAGCG 
GCAGCCTGAA 
GGACGCGGCG 
CTACGGCATT 
GCATGGACTA 
GCCGTGTACG 
CCCCGCCACA 
CCGGCATCGA 
GACTACACCC 
TTCCATCGAC 
ACGCCGACCC 
CGCCTGTTCG 
ATACGCCAGC 
CCAACGCCTA 
TTTGCCCAAA 
TCTCGCCACC 
GCAGATACAG 
ATGACCTATG 
CGACCTGACC 
TCCCGCAATC 
GGCAACAATC 
GAACGCATCC 
CAGCAGGACG 
GCCAAAACCC 
ATGGCAGATA 
GCAGCCGCCT 
ACTGCCTACC 
AGGCGTGCGC 
TCCCCAACCC 
GCCTACGCCG 
CGAACTGTCG 
AGCCCGACCG 
ACCTATCGGT 



CCTGCTGTTT 
TTTCAGACGA 
GTTACCGCCG 
CACGCACACC 
GCGTCAGCGT 
CTCGACCGCG 
CTCCGACCGC 
CCAACTACCA 
GGCAATGCCA 
GGCGGGGCTG 
TGCGCAAACG 
GGCAACCGCA 
TGCCGAAGGC 
ACTCGTGGCG 
TTGGAATACG 
CCAGCAGGCG 
ACAGCCAAGG 
AATTGGGCGA 
ACACCGCTTC 
GCAGCCGCTT 
CACAACACCG 
GCGCACCCAC 
GCCGCGAACA 
AACAAATACG 
CGAATTTTCC 
CCATCCCGCA 
CGTTTCCGCG 
CCGTTACCGC 
TGTCCGCCAA 
GGCAACCTGT 
GCAAAAAGAC 
TGGAAGCCGG 
GCCGCCGTGT 
CGACCCGAGC 
ACGGCTGGGA 
CAGGCAGGTT 
GAACCCCGAC 
ACTTTGCCCC 
TGGCAGAGCG 
CGCCGCCAAA 
TCGCCGACAT 
CTGAACGTGG 
CCACAGCTAC 
TTAAATAA 



GCCGCCCTGT 
CCCAAAACCG 
ACCGCACCGC 
CCGCTCGGGC 
CATCACATCG 
CCCTGTTGCA 
GCGGGCTACA 
AATCAACGGC 
ACACCGCCGC 
CTGGACGGCA 
CCCGACCCGC 
AACATTTCGG 
ACGCTGCGCG 
GCAGCGCGAA 
ACATCGCACC 
AAAGAAACCG 
TTATGCCACC 
ACAGCCGCCA 
AACCAAGACT 
CCGCCAGCCC 
CCGCCACCGA 
AGCGCCAGCG 
CGATTTAATC 
GCGAACGCAG 
CGCACGGGTG 
ATACGGCACC 
CCGCCGACAA 
ACCGGCAGCT 
CCGTTTCACC 
CGCTTTACGG 
GAACACGGCA 
CATCAAAGGC 
ACCGCGCCCG 
GGCAACACCT 
AATCGAAGTC 
ACAGCCAAAG 
AGCGTACCCG 
CGAAGCCCCC 
AAACCCACAC 
GCCCGCGCCG 
CATGGCGCGT 
ACAATCTGTT 
GGCGCACTGC 



TGCCCGTGTA 
CAGGAAAGCA 
GAGTTCCAAC 
TGCCCATGAC 
CAACAAATGC 
GGCGACCGGC 
ACTACCTGTT 
ATCCCCGTTG 
CTATGAGCGC 
CGGGCGAGCC 
AAGCCATTGT 
GCTGGGCGCG 
GCCGCCTGGT 
CGCAGCCGCG 
GCAAACCCGC 
CCGACGCGCC 
GCCTTCGGCC 
CCGTGCGCTC 
GGAAACTCAA 
TACGGCGTAG 
CCTGATTCCC 
TGTCATTAAT 
GCGGGTATCA 
CATCATCCCC 
CCTACCCGCA 
AGGCGGCAAA 
CCTTTCGCTG 
ACGACAGCCG 
CCCTACACAG 
CTCGTACAGC 
GCTACCTGAA 
GAATGGCTTG 
TAAAAACAAC 
ACTACCGCGC 
GGCGGCCGCA 
CAAAACCCGC 
AACGCAGCTT 
AGCGGCTGGA 
CGACCCTGCC 
CCGACAACAG 
TACCGCTTCA 
CAACAAACAC 
GGACAGTGAA 



This encodes a protein having amino acid sequence <SEQ ID 668>: 



1 MTRFKYSLLF AALLPVYAQA DVSVSDDPKP QESTELPTIT VTADRTASSN 

51 DGYTVSGTHT PLGLPMTLRE IPQSVSVITS QQMRDQNIKA LDRALLQATG 

101 TSRQIYGSDR AGYNYLFARG SRIANYQING I PVADA1ADT GNANTAAYER 

151 VEWRGVAGL LDGTGEPSAT VNLVRKRPTR KPLFEVRAEA GNRKHFGLGA 

201 DVSGSLNAEG TLRGRLVSTF GRGDSWRQRE RSRDAELYGI LEYDIAPQTR 

251 VHAGMDYQQA KETADAPLSY AVYDSQGYAT AFGPKDNPAT NWANSRHRAL 

301 NLFAGIEHRF NQDWKLKAEY DYTRSRFRQP YGVAGVLSID HNTAATDLIP 

351 GYWHADPRTH SASVSLIGKY RLFGREHDLI AGINGYKYAS NKYGERSIIP 

401 NAIPNAYEFS RTGAYPQPAS FAQTIPQYGT RRQIGGYLAT RFRAADNLSL 

451 ILGGRYSRYR TGSYDSRTQG MTYVSANRFT PYTGIVFDLT GNLSLYGSYS 

501 SLFVPQSQKD EHGSYLKPVT GNNLEAGIKG EWLEGRLNAS AAVYRARKNN 

551 LATAAGRDPS GNTYYRAANQ AKTHGWEIEV GGRITPEWQI QAGYSQSKTR 

601 DQDGSRLNPD SVPERSFKLF TAYHFAPEAP SGWTIGAGVR WQSETHTDPA 

651 TLRIPNPAAK ARAADNSRQK AYAVADIMAR YRFNPRAELS LNVDNLFNKH 

701 YRTQPDRHSY GALRTVNAAF TYRFK* 

ORF23a and ORF23-1 show 99.2% identity in 725 aa overlap: 



10 20 30 40 50 60 

or f 23a . pep MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I II I 1 I I I I I I I I I i I I I I I I I I I [ I 
orf23-l MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 

10 20 30 40 50 60 



WO 99/24578 



-382- 



PCT/IB98/01665 



10 



15 



70 80 90 100 110 120 

orf 23a . pep PLGLPMTLREIPQSVSVITSQQMRDQNIKALDRALLQATGTSRQIYGSDRAGYNYLFARG 
I I I I I I I I I I I I I I I I I I I I I II I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 23-1 PLGLPMTLREIPQSVSVITSQQMRDQNIKTLDRALLQATGTSRQIYGSDRAGYNYLFARG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 23a. pep SRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRPTR 
I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I M I I I I I I 1 I I I II 
orf 23-1 SRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRLTR 

130 140 150 160 170 180 

190 200 210 220 230 240 

or f 2 3a . pep KPLFEVRAEAGNRKHFGLGADVSGSLNAEGTLRGRLVSTFGRGDSWRQRERSRDAELYGI 
I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I M I I : I I I I I I I I I I I I 
orf 23-1 KPLFEVRAEAGNRKHFGLDADVSGSLNTEGTLRGRLVSTFGRGDSWRRRERSRDAELYGI 

190 200 210 220 230 240 



20 



25 



30 



35 



40 



250 260 270 280 290 300 

or f 2 3a . pep LEYDIAPQTRVHAGMDYQQAKETADAPLS YAVYDSQGYATAFGPKDNPATNWANSRHRAL 

II I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 23-1 LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRAL 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 23a . pep NLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHNTAATDLIPGYWHADPRTH 
I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I 1 I I I I I I ( I I I I I I I I I I I I I I I I I I I 
orf 23-1 NLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHNTAATDLIPGYWHADPRTH 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 23a . pep SASVSLIGKYRLFGREHDLIAGINGYKYASNKYGERSIIPNAIPNAYEFSRTGAYPQPAS 

I I M I I I I M I I I II I 11 I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 23-1 SASVSLIGKYRLFGREHDLIAGINGYKYASNKYGERSIIPNAIPNAYEFSRTGAYPQPAS 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 23a . pep FAQTIPQYGTRRQIGGYLATRFRAADNLSLILGGRYSRYRTGSYDSRTQGMTYVSANRFT 
I I I I I I I 1 I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I II I I I I 
orf 23-1 FAQTIPQYGTRRQIGGYLATRFRAADNLSLILGGRYTRYRTGSYDSRTQGMTYVSANRFT 

430 440 450 460 470 480 



45 



490 500 510 520 530 540 

orf 23a . pep PYTGIVFDLTGNLSLYGSYSSLFVPQSQKDEHGSYLKPVTGNNLEAGIKGEWLEGRLNAS 
I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I II I 
orf 23-1 PYTGIVFDLTGNLSLYGSYSSLFVPQSQKDEHGSYLKPVTGNNLEAGIKGEWLEGRLNAS 

490 500 510 520 530 540 



50 



55 



60 



65 



550 560 570 580 590 600 

orf 23a . pep AAVYRARKNNLATAAGRDPSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKTR 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 1 I I I I I I I I I I 
orf 23-1 AAVYRARKNNLATAAGRDPSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKTR 

550 560 570 580 590 600 

610 620 630 640 650 660 

orf 23a . pep DQDGSRLNPDSVPERSFKLFTAYHFAPEAPSGWTIGAGVRWQSETHTDPATLRIPNPAAK 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II 
orf 23-1 DQDGSRLNPDSVPERSFKLFTAYHFAPEAPSGWTIGAGVRWQSETHTDPATLRIPNPAAK 

610 620 630 640 650 660 

670 680 690 700 710 720 

orf 23a . pep ARAADNSRQKAYAVADIMARYRFNPRAELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 23-1 ARAADNSRQKAYAVADIMARYRFNPRAELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 

670 680 690 700 710 720 



70 



orf 23a, pep 
orf23-l 



TYRFKX 
1 I I I I I 
TYRFKX 
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Homology with a predicted ORF from K gonorrhoeae 

ORF23 shows 93.4% identity over a 21 laa overlap with a predicted ORF (ORF23.ng) from N. 
gonorrhoeae: 

orf 23 .pep GYNYLFARGSRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLD 51 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf23ng SAVDACRIPGYNYLFARGSRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLPD 60 

orf 23 . pep GTGEPSATVNLVRKRLTRKPLFEVRAEAGNRKHFGLDADVSGSLNTEXXLRGRLVSTFGR 111 

I I I I M 1 I I 1 I I I I : I I I I I I I I I I I I I I I I I I I I I I I I II I I : I : I I I I I II I I I I 
orf23ng GTGEPSATVNLVRKHPTRKPLFEVRAEAGNRKHFGLGADVSGSLNAEGTLRGRLVSTFGR 120 

orf 23 . pep GDSWRRRERSRXAELYGILEYDIAPQTRVHAXMDYQQAKETADAPLSYAVYDSQGYATAF 171 

Mill: I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I M 
orf23ng GDSWRQLERSRDAELYGILEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAF 180 t 

orf 23 . pep G PK DN P ATN W AN S HHRALN L FAG I EHR FNQD WKLKAE Y D Y 211 

I I I I I I I I I I : I I :: I I I I I II I M II I I I I I I I I I I I II 
orf23ng GPKDNPATNWSNSRNRALNLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHS 240 

The ORF23ng nucleotide sequence <SEQ ID 669> is predicted to encode a protein comprising 
amino acid sequence <SEQ ID 670>: 

1 SAVDACRIPG YNYLFARGSR IANYQINGIP VADALADTGN ANTAAYERVE 

51 WRGVAGLPD GTGEPSATVN LVRKHPTRKP LFEVRAEAGN RKHFGLGADV 

101 SGSLNAEGTL RGRLVSTFGR GDSWRQLERS RDAELYGILE YDIAPQTRVH 

151 AGMDYQQAKE TADAPLSYAV YDSQGYATAF GPKDNPATNW SNSRNRALNL 

201 FAGIEHRFNQ DWKLKAEYDY TRSRFRQPYG VAGVLSIDHS TAATDLIPGY 

251 WHADPRTHSA SMSLTGKYRL FGREHDLIAG INGYKYASNK YGERSIIPNA 

301 IPNAYEFSRT GAYPQPSSFA QTIPQYDTRR QIGGYLATRF RAADNLSLIL 

351 GGRYSRYRAG SYNSRTQGMT YVSANRFTPY TGIVFDLTGN LSLYGSYSSL 

401 FVPQLQKDEH GSYLKPVTGN NLEADIKGEW LEGRLNASAA VYRARKNNLA 

451 TAAGRDQSGN TYYRAANQAK THGWEIEVGG RITPEWQIQA GYSQSKPRDQ 

501 DGSRLNPDSV PERSFKLFTA YHLAPEAPSG RTIGAGVRRQ GETHTDPAAL 

551 RIPNPAAKAR AVANSRQKAY AVADIMARYR FNPRTELSLN VDNLFNKHYR 

601 TQPDRHSYGA LRTVNAAFTY RFK* 

Further work revealed the complete nucleotide sequence <SEQ ID 67 1>: 



1 ATGACACGCT TCAAATACTC 

51 CGCGCAGGCC GATGTTTCTG 

101 CCGAATTGCC GACCATCACC 

151 GACGGCTACA CCGTTTCCGG 

201 CCTGCGCGAA ATCCCGCAGA 

251 GCGACCAAAA CATCAAAACG 

301 ACCAGCCGCC AGATTTACGG 

351 CGCGCGCGGC AGCCGCATCG 

401 CCGACGCGCT GGCCGATACG 

451 GTAGAAGTCG TGCGCGGCGT 

501 TTCTGCCACC GTCAATCTGG 

551 TTGAAGTCCG CGCCGAAGCC 

601 GACGTATCGG GCAGCCTGAA 

651 TTCCACCTTC GGACGCGGCG 

701 ATGCCGAACT CTACGGCATT 

751 GTCCACGCAG GCATGGACTA 

801 GCTCAGCTAC GCCGTGTACG 

851 CAAAAGACAA CCCCGCCACA 

901 AACCTGTTCG CCGGCATAGA 

951 AGCCGAATAC GACTACACCC 

1001 CAGGCGTACT TTCCATCGAC 

1051 GGTTATTGGC ACGCcgatcc 

1101 CGGCAAATAC CgcctGTTCG 

1151 ACGGCTACAA ATACGCCAGC 

1201 AACGCCATTC CCAACGCCTA 

1251 GCCATCATCG TTTGCCCAAA 

1301 TCGGCGGCTA TCTCGCCACC 

1351 ATACTCGGCG GCAGATACAG 



CCTGCTTTTT GCCGCCCTGC TACCCGTGTA 
TTTCAGACGA CCCCAAACCG CAGGAAAGCA 
GTTACCGCCG ACCGCACCGC GAGTTCCAAC 
CACGCACACC CCGTTCGGGC TGCCCATGAC 
GCGTCAGCGT CATCACATCG CAACAAATGC 
CTCGACCGCG CCCTGTTGCA GGCGACCGGC 
CTCCGACCGC GCGGGCTACA ACTACCTGTT 
CCAACTACCA AATCAACGGC ATCCCCGTTG 
GGCAATGCCA ACACCGCCGC CTATGAGCGC 
GGCGGGGCTG CCGGACGGCA CGGGCGAGCC 
TACGCAAACA CCCGACCCGC AAGCCATTGT 
GGCAACCGCA AACATTTCGG GCTGGGCGCG 
CGCCGAAGGC ACGCTGCGCG GCCGCCTGGT 
ACTCGTGGCG GCAGCTCGAA CGCAGCCGCG 
TTGGAATACG ACATCGCACC GCAAACCCGC 
CCAGCAGGCG AAAGAAACCG CAGACGCGCC 
ACAGCCAAGG TTATGCCACC GCCTTCGGCC 
AATTGGTCGA ACAGCCGCAA CCGTGCGCTC 
ACACCGCTTC AACCAAGACT GGAAACTCAA 
GTAGCCGCTT CCGCCAGCCC TACGGTGTGG 
CACAGCACTG CCGCCACCGA CCTGATTCCC 
GCGCACCCAC AGCGCCAGCA TGTCATTGAC 
GCCGCGAGCA CGATTTAATC GCGGGTATCA 
AACAAATACG GCGAACGCAG CATCATTCCC 
CGAATTTTCC CGCACGGGCG CCTATCCGCA 
CCATCCCGCA ATACGACACC AGGCGGCAAA 
CGTTTCCGCG CCGCCGACAA CCTTTCGCTG 
CCGCTACCGC GCAGGCAGCT ACAACAGCCG 
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1401 CACACAAGGC ATGACCTATG TGTCCGCCAA CCGTTTCACC CCCTACACAG 

1451 GCATCGTGTT CGATCTGACC GGCAACCTGT CGCTTTACGG CTCGTACAGC 

1501 AGCCTGTTCG TCCCGCAATT GCAAAAAGAC GAACACGGCA GCTACCTGAA 

1551 ACCCGTAACC GGCAACAATC TGGAAGCCGA CATCAAAGGC GAATGGCTTG 

1601 AAGGGCGTCT GAACGCATCC GCCGCCGTGT ACCGCGCCCG TAAAAACAAC 

1651 CTCGCCACCG CAGCAGGACG CGACCAGAGC GGCAACACCT ACTATCGCGC 

1701 CGCCAACCAA GCCAAAACCC ACGGCTGGGA AATCGAAGTC GGCGGCCGCA 

1751 TCACGCCCGA ATGGCAGATA CAGGCAGGCT ACAGCCAAAG CAAACCCCGC 

1801 GACCAAGACG GCAGCCGCCT GAACCCCGAC AGCGTAcCCG AACGCAGCTT 

1851 CAAACTCTTC ACCGCCTACC ACTTAGCCCC CGAAGCCCCC AGCGGCCGGA 

1901 CCATcggTGC GGGTGTGCGC CGGCAGGGCG AAACCCACAC CGACCCAGCC 

1951 GCGCTCCGCA TCCCCAACCC CGCCGCCAAA GCCCGCGCCG TCGCCAACAG 

2001 CCGCCAGAAA GCCTACGCCG TCGCCGACAT CATGGCGCGT TACCGCTTCA 

2051 ATCCGCGCAC CGAACTGTCG CTGAACGTGG ACAACCTGTT CAACAAACAC 

2101 TACCGCACCC AGCCCGACCG CCACAGCTAC GGCGCACTGC GGACAGTGAA 

2151 CGCGGCGTTT ACCTATCGGT TTAAATAA 

This corresponds to the amino acid sequence <SEQ ID 672; ORF23ng-l>: 



1 MTRFKYSLLF AALLPVYAQA DVSVSDDPKP QESTELPTIT VTADRTASSN 

51 DGYTVSGTHT PFGLPMTLRE IPQSVSVITS QQMRDQNIKT LDRALLQATG 

101 TSRQIYGSDR AGYNYLFARG SRIANYQING IPVADALADT GNANTAAYER 

151 VEWRGVAGL PDGTGEPSAT VNLVRKHPTR KPLFEVRAEA GNRKHFGLGA 

201 DVSGSLNAEG TLRGRLVSTF GRGDSWRQLE RSRDAELYGI LEYDIAPQTR 

251 VHAGMDYQQA KETADAPLSY AVYDSQGYAT AFGPKDNPAT NWSNSRNRAL 

301 NLFAGIEHRF NQDWKLKAEY DYTRSRFRQP YGVAGVLSID HSTAATDLIP 

351 GYWHADPRTH SASMSLTGKY RLFGREHDLI AGINGYKYAS NKYGERSIIP 

401 NAIPNAYEFS RTGAYPQPSS FAQTIPQYDT RRQIGGYLAT RFRAADNLSL 

451 ILGGRYSRYR AGSYNSRTQG MTYVSANRFT PYTGIVFDLT GNLSLYGSYS 

501 SLFVPQLQKD EHGSYLKPVT GNNLEADIKG EWLEGRLNAS AAVYRARKNN 

551 LATAAGRDQS GNTYYRAANQ AKTHGWEIEV GGRITPEWQI QAGYSQSKPR 

601 D.QDGSRLNPD SVPERSFKLF TAYHLAPEAP SGRTIGAGVR RQGETHTDPA 

651 ALRIPNPAAK ARAVANSRQK AYAVADIMAR YRFNPRTELS LNVDNLFNKH 

701 YRTQPDRHSY GALRTVNAAF TYRFK* 

ORF23ng-l and ORF23-1 show 95.9% identity in 725 aa overlap: 



10 20 30 40 50 60 

orf 23-1, pep MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 
I I I I I 1 I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 
orf23ng-l MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 23-1. pep PLGLPMTLREIPQSVSVITSQQMRDQNIKTLDRALLQATGTSRQIYGSDRAGYNYLFARG 
I : I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II II I I 1 I I I I II I I I I I II 
orf23ng-l PFGLPMTLREIPQSVSVITSQQMRDQNIKTLDRALLQATGTSRQIYGSDRAGYNYLFARG 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 23-1 . pep SRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRLTR 
I I I I I I I I I I I I I I I I I 1 I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I : I I 
orf23ng-l SRI ANYQINGI PVADALADTGNANTAAYERVEWRGVAGLPDGTGE PSATVNLVRKHPTR 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 23-1 . pep KPLFEVRAEAGNRKHFGLDADVSGSLNTEGTLRGRLVSTFGRGDSWRRRERSRDAELYGI 
I I I II I I I I I I I I I I I I I I I I II I II : I I 1 I II I I I I I I I I I I II I : I IN. Ml i 
orf23ng-l KPLFEVRAEAGNRKHFGLGADVSGSLNAEGTLRGRLVSTFGRGDSWRQLERSRDAELYGI 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 23-1 . pep LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRAL 
I I ! I II M I I 1 I I I I I I I I II I I I I 1 i I M I I I I I I I ! I 1 I M I I I I I I I I I : I I I : I I I 
orf23ng-l LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWSNSRNRAL 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 23-1. pep NLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHNTAATDLIPGYWHADPRTH 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I 
orf23ng-l NLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHSTAATDLIPGYWHADPRTH 
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310 320 330 340 350 360 

370 380 390 400 410 420 

orf 23-1 . pep S AS VS L I GKYRLFGREHDLI AG INGYKYASNKYGERS 1 1 PN AI PNAYE FSRTGAY PQPAS 
M I : I I I I I I I I I I I I I I 1 I I I I I I I I I I ! I I I 1 I I I I I I I 1 I I I M I I I i I I I I I I : I 
orf23ng-l SASMSLTGKYRLFGREHDLIAG INGYKYASNKYGERS I I PNAI PNAYE FSRTGAYPQPSS 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 23-1 . pep FAQTIPQYGTRRQIGGYLATRFRAADNLSLILGGRYTRYRTGSYDSRTQGMTYVSANRFT 
III III II II I I I I I I I I I I I I I I M M I I I I I I I : I I I : I I I : I I I I I I I I I I I I I I I 
orf23ng-l FAQTIPQYDTRRQIGGYLATRFRAADNLSLILGGRYSRYRAGSYNSRTQGMTYVSANRFT 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf 23-1 . pep PYTGIVFDLTGNLSLYGSYSSLFVPQSQKDEHGSYLKPVTGNNLEAGIKGEWLEGRLNAS 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf23ng-l PYTGIVFDLTGNLSLYGSYSSLFVPQLQKDEHGSYLKPVTGNNLEADIKGEWLEGRLNAS 

490 500 510 520 530 540 

550 560 570 580 590 600 

orf 23-1. pep AAVYRARKNNLATAAGRDPSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKTR 
I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf23ng-l AAVYRARKNNLATAAGRDQSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKPR 

550 560 570 580 590 600 

610 620 630 640 650 660 

orf 23-1 . pep DQDGSRLNPDSVPERSFKLFTAYHFAPEAPSGWTIGAGVRWQSETHTDPATLRIPNPAAK 
I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I : I I fl I I I : I I I I I I I I I 
orf23ng-l DQDGSRLNPDSVPERSFKLFTAYHLAPEAPSGRTIGAGVRRQGETHTDPAALRIPNPAAK 

610 620 630 640 650 660 

670 680 690 700 710 720 

or f 2 3-1 . pep ARAADNSRQKAYAVADIMARYRFNPRAELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 
111: II I I I I I II I I I I I I I I I I II : I I I I I I I I I I I I I I I I I II I i I I I I I I I I I I I I 
orf23ng-l ARAVANSRQKAYAVADIMARYRFNPRTELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 

670 680 690 700 710 720 



orf 23-1. pep TYRFKX 
I I I I I I 

orf23ng-l TYRFKX 

In addition, ORF23ng-l shows significant homology with an OMP from E.colt 

sp|Pl6869|FHUE_ECOLI OUTER-MEMBRANE RECEPTOR FOR FE ( III ) -COPROGEN, FE(III)- 
FERRIOXAMINE B AND FE (III) -RHODOTRULIC ACID PRECURSOR >gi 1 1651542 I gnl I PID | dl015403 
(D90745) Outer membrane protein FhuE precursor [Escherichia coli] 
>gi 1 1651545 Ignl |PID|dlO15405 (D90746) Outer membrane protein FhuE precursor 
[Escherichia coli] >gi 11787344 (AE000210) outer-membrane receptor for Fe(III)- 
coprogen, Fe (III) -ferrioxamine B and Fe (III) -rhodotrulic acid precursor 
[Escherichia coli] Length = 729 
Score = 332 bits (843), Expect = 3e-90 

Identities = 228/717 (31%), Positives - 350/717 (48%), Gaps = 60/717 (8%) 

Query: 38 TITVTADRTASSN — DGYTVSGTHTPFGLPMTLREIPQSVSVITSQQMRDQNIKTLDRAL 95 

T+ V TA + + Y+V+ T + MT R+IPQSV++++ Q+M DQ ++TL + 

Sbjct: 4 3 TVIVEGSATAPDDGENDYSVTSTSAGTKMQMTQRDIPQSVTIVSQQRMEDQQLQTLGEVM 102 

Query: 96 LQATGTSRQIYGSDRAGYNYLFARGSRIANYQINGIP VADALADTGNANTAA 147 

G S+ SDRA Y ++RG +1 NY ++GIP + DAL+D A 
Sbjct: 103 ENTLGISKSQADSDRALY YSRGFQIDNYMVDGIPTYFESRWNLGDALSDM AL 154 

Query: 148 YERVEWRGVAGLPDGTGEPSATVNLVRKHPTRKPLF-EVRAEAGNRKHFGLGADVSGSL 206 

+ERVEWRG GL GTG PSA +N+VRKH T + +V AE G+ AD+ L 

Sbjct: 155 FERVEWRGATGLMTGTGNPSAAINMVRKHATSREFKGDVSAEYGSWNKERYVADLQSPL 214 

Query: 207 NAEGTLRGRLVSTFGRGDSWRQLERSRDAELYGILEYDIAPQTRVHAGMDYQQAKETADA 266 

+G +R R+V + DSW S GI++ D+ T + AG +YQ+ + 

Sbjct: 215 TEDGKIRARIVGGYQNNDSWLDRYNSEKTFFSGIVDADLGDLTTLSAGYEYQRIDVNSPT 274 



Query: 267 PLSYAVYDSQGYATAFGPKDNPATNWSNSRNRALNLFAGIEHRFNQDWKLKAEYDYTRSR 326 
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+++ G + ++ + A +W+ + +F ++ +F W+ ++ 

Sbjct: 275 WGGLPRWNTDGSSNSYDRARSTAPDWAYNDKEINKVFMTLKQQFADTWQATLNATHSEVE 334 

Query: 327 F — RQPYGVAGVLSIDHSTAA — TDLIPGY WHADPRTHS A- SMS LTGKYRLFG 374 

5 - ^ F + YAVD ++ PG+ W++ R A + G Y LFG 

Sbjct: 335 FDSKMMYVDAYVNKADGMLVGPYSNYGPGFDYVGGTGWNSGKRKVDALDLFADGSYELFG 394 

Query: 37 5 REHDLIAGINGYKYASNKYGER — SIIPNAIPNAYEFSRTGAYPQPSSFAQTIPQYDTRR 432 
R+H+L+ G Y +N+Y +1 P+ I + Y F+ G +PQ Q++ Q DT 

10 Sbjct: 395 RQHN LMFG-G S YSKQNNRYFS SWANI FPDE IG S FYN FN — GN FPQTDWS PQS LAQDDTTH 451 



15 



Query: 433 QIGGYLATRFRAADNLSLILGGRYSRYRAGSYNSRTQGMTY-VSANRFTPYTGIVFDXXX 4 91 

Y ATR AD L LILG RY+ +R + +TY + N TPY G+VFD 

Sbjct: 452 MKSLYAATRVTLADPLHLILGARYTNWRVDT LTYSMEKNHTTPYAGLVFDIND 504 

Query: 4 92 XXXXXXXXXXXFVPQLQKDEHGSYLKPVTGNNLEADIKGEWLEGRLNASAAVYRARKNNL 551 

FPQ +D GYL P+TGNN E +K +W+ RL + A++R ++N+ 
Sbjct: 505 NWSTYASYTSIFQPQNDRDSSGKYLAPITGNNYELGLKSDWMNSRLTTTLAIFRIEQDNV 564 



20 Query: 552 ATAAGR DQSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKPRDQDGSRLN 608 

A + G +G T Y+A + + G E E+ G IT WQ+ G ++ D +G+ +N 

Sbjct: 565 AQSTGTPIPGSNGETAYKAVDGTVSKGVEFELNGAITDNWQLTFGATRYIAEDNEGNAVN 624 

Query: 609 PDSVPERSFKLFTAYHLAPEAPSGRTIGAGVRRQGETHTDPAALRIPNPAAKARAVANSR 668 
25 P ++P + K+FT+Y LP P T+G GV Q +TD P RA 

Sbjct: 625 P-NLPRTTVKMFTSYRL-PVMPE-LTVGGGVNWQNRVYTDTV TPYGTFRA E 672 

Query: 669 QKAYAVADIMARYRFNPRTELSLNVDNLFNKHYRTQPDRH-SYGALRTVNAAFTYRF 724 
Q +YA+ D-f RY+ L NV+NLF+K Y T + YG R + TY+F 

30 Sbjct: 673 QGSYALVDLFTRYQVTKNFSLQGNVNNLFDKTYDTNVEGSIVYGTPRNFSITGTYQF 729 

Based on this analysis, it was predicted that these proteins from N. meningitidis and gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF23-1 (77.5kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
35 above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
15A shows the results of affinity purification of the His-fusion protein, and Figure 15B shows the 
results of expression of the GST-fusion in E.colu Purified His-fusion protein was used to immunise 
mice, whose sera were used for Western blot (Figure 15C) and for ELISA (positive result). These 
experiments confirm that ORF23-1 is a surface-exposed protein, and that it is a useful immunogen. 

40 Example 80 

The following partial DNA sequence was identified in N, meningitidis <SEQ ID 673>: 

1 ATGCGCACGG CAGTGGTTTT GCTGTTGATC ATGCCGATGG CGGCTTCGTC 

51 GGCAATGATG CCGGAAATGG TGTGCGCGGG CGTGTCGCCG GGAACGGCAA 

101 TCATATCCAA GCCGACCGAA CAAACGGCGG TCATGGCTTC GAGTTTGTCC 

45 151 AGCGTCAgcA CGCCTGCTTC GGCGgcGgCa ATCATACCTT CGTCTTCGGA 

201 AACGGGGATA AACGcGCCAC TCAAACCCCC GACCGCGCTG GAAGCCATCA 

251 TGCCGCCTTT TTTCACGGCA TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 

301 CCGTGCGTAC CGCAGACGCT CAAGCCCATT TnTTCAAGAA TGCGTGCCAC 

351 TnAGTCGCCG ACGGGG . . 

50 This corresponds to the amino acid sequence <SEQ ID 674; ORF24>: 

1 MRTAWLLLI MPMAASSAMM PEMVCAGVSP GTAIISKPTE QTAVMASSLS 
51 SVSTPASAAA IIPSSSETGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 
101 PCVPQTLKPI XSRMRATXSP TG. . 
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Further work revealed the complete nucleotide sequence <SEQ ID 675>: 

1 ATGCGCACGG CAGTGGTTTT GCTGTTGATC ATGCCGATGG CGGCTTCGTC 

51 GGCAATGATG CCGGAAATGG TGTGCGCGGG CGTGTCGCCG GGAACGGCAA 

101 TCATATCCAA GCCGACCGAA CAAACGGCGG TCATGGCTTC GAGTTTGTCC 

151 AGCGTCAGCA CGCCTGCTTC GGCGGCGGCA ATCATACCTT CGTCTTCGGA 

201 AACGGGGATA AACGCGCCAC TCAAACCCCC GACCGCGCTG GAAGCCATCA 

251 TGCCGCCTTT TTTCACGGCA TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 

301 CCGTGCGTAC CGCAGACGCT CAAGCCCATT TCTTCAAGAA TGCGTGCCAC 

351 TGAGTCGCCG ACGGCGGGGG TCGGCGCCAG CGACAAGTCG AGAATACCAA 

401 ACGGGATATT CAGCATTTTT GAGGCTTCGC GGCCGATGAG TTCGCCCACG 

451 CGGGTAATTT TGAAAGCAGT TTTCTTCACT ACTTCCGCAA CTTCGGTCAA 

501 TGTCGTTGCA TCTGAATTTT CCAACGCGGC TTTTACGACA CCTGGGCCGG 

551 ATACGCCGAC ATTGATAACG GCATCCGCTT CGCCCGAACC ATGAAACGCG 

601 CCCGCCATAA ACGGGTTGTC TTCCACCGCG TTGCAGAACA CGACAATTTT 

651 AGCGCAGCCG AAACCTTCGG GCGTGATTTC CGCCGTGCGT TTGACGGTTT 

701 CGCCCGCCAG CTTGACCGCA TCCATATTGA TACCGGCACG CGTACTGCCG 

751 ATATTGATGG AGCTGCACAC AATATCGGTA GTCTTCATCG CTTCGGGAAT 

801 GGAGCGGATT AACACCTCAT CCGAAGGCGA CATCCCTTTT TGCACCAACG 

851 CGGAAAAACC GCCGATAAAA GACACACCGA TGGCTTTGGC AGCTTTATCC 

901 AAAGTTTGCG CCACGCTGAC GTAA 

This corresponds to the amino acid sequence <SEQ ID 676; ORF24-l>: 

1 MRTAWLLLI MPMAASSAM M PEMVCAGVSP GTAIISKPTE QTAVMASSLS 
51 SVSTPASAAA IIPSSSETGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 
101 PCVPQTLKPI SSRMRATESP TAGVGASDKS RIPNGIFSIF EASRPMSSPT 
151 RVILKAVFFT TSATSVNWA SEFSNAAFTT PGPDTPTLIT ASASPEP*NA 
201 PAINGLSSTA LQNTTILAQP KPSGVIS AVR LTVSPASLTA SILI PARVLP 
251 ILMELHTISV VFIA SGMERI NTSSEGDIPF CTNAEKPPIK DTPMALAALS 
301 KVCATLT* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF24 shows 96.4% identity over a 307 aa overlap with an ORF (ORF24a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISXPTEQTAVIASSLSNVSTPASAAA 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I Ml:l I I I Ml I Ml III I 
MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAI I SKPTEQTAVMASSLS SVSTPASAAA 

10 20 30 40 50 60 

70 80 90 100 110 120 

I I PS SSXTGINAPLKPPTALEAIMPPFFTASFSNAKAAVVPCVPQTLKPI SSRMRATESP 
Mill I MMMMMMMMMMMMMMMMMMMMMMMMMMI 

I I PS SSETGINAPLKPPTALEAIMPPFFT AS FSNAKAAWPCVPQTLKPI SSRMRATESP 
70 80 90 100 110 120 

130 140 150 160 170 180 

TAGVGASDKSRI PNGI FS I FEASRPMSS PTRVI LKAVFFTTSATS VNWASEFSNAAFTT 
II M M I M I M M M I M M M M I M M M I II M M M M M M M M I M M I I M 
TAGVGASDKSRI PNGI FSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 
130 140 150 160 170 180 

190 200 210 220 230 240 

PGPDTPTLITASASPEPXNAPAIXGLSSXALQNTTILAQPKPSSVISXVRLMVSPASLTA 

M M I II I II I I I 1 I I I I I I I I I It It : I I 1 1 I I I I I M 1 M : I I I III Mill Ml 
PGPDTPTLITASASPEPXNAPAINGLSSTALQNTTILAQPKPSGVISAVRLTVSPASLTA 
190 200 210 220 230 240 

250 260 270 280 290 300 

SILIPARVLPILMELHTISWFIASGMERXNTSSEGDIPFCTSAEKPPIKDTPMALAALS 

II I M I I I II I I M I M II I I I I II I II I I M I I I I I I I M : I I I I I I I I I I I M M I I 
SILIPARVLPILMELHTISWFIASGMERINTSSEGDIPFCTNAEKPPIKDTPMALAALS 
250 260 270 280 290 300 
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orf 2 4a . pep KVCATLTX 
I I I I I I I I 
orf 2 4 KVCATLTX 

The complete length ORF24a nucleotide sequence <SEQ ID 677> is: 

1 ATGCGCACGG CAGTGGTTTT GCTGTTGATC ATGCCGATGG CGGCTTCGTC 

51 GGCAATGATG CCGGAAATGG TGTGCGCGGG TGTGTCGCCG GGAACGGCAA 

101 TCATATCCAA NCCGACCGAA CAAACGGCGG TCATCGCTTC GAGTTTATCC 

151 AACGTCAGCA CGCCTGCTTC GGCGGCGGCA ATCATACCTT CGTCTTCGGA 

201 NACGGGGATA AACGCGCCAC TCAAACCGCC AACCGCGCTC GAAGCCATCA 

251 TGCCGCCCTT TTTCACGGCA TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 

301 CCGTGCGTAC CGCAGACGCT CAAACCCATT TCTTCAAGAA TGCGCGCCAC 

351 CGAGTCGCCG ACGGCAGGGG TCGGTGCCAG CGACAAGTCG AGAATACCAA 

401 ACGGGATATT CAGCATTTTT GAGGCTTCGC GGCCGATGAG TTCGCCCACG 

451 CGGGTAATTT TGAAGGCGGT TTTCTTCACA ACTTCGGCAA CTTCGGTCAA 

501 TGTCGTTGCA TCCGAATTTT CCAACGCGGC TTTTACGACA CCCGGGCCGG 

551 ATACGCCGAC ATTAATCACA GCATCCGCTT CGCCTGAGCC GTGAAACGCG 

601 CCCGCCATAN ACGGGTTGTC TTCCNCCGCG TTGCAGAACA CGACGATTTT 

651 GGCGCAGCCG AAACCTTCTA GTGTGATTTC ANCCGTGCGT TTGATGGTTT 

701 CGCCCGCCAG TCTGACCGCG TCCATATTGA TACCGGCGCG CGTACTGCCG 

751 ATATTGATGG AGCTGCACAC GATATCAGTA GTCTTCATCG CTTCGGGAAT 

801 GGAACGGATN AACACCTCGT CAGAAGGCGA CATACCTTTT TGCACCAGCG 

851 CGGAAAAGCC GCCAATAAAA GACACGCCGA TGGCTTTGGC AGCCTTATCC 

901 AAAGTTTGCG CCACGCTGAC GTAA 

This encodes a protein having amino acid sequence <SEQ ID 678>: 

1 MRTAWLLLI MPMAASSAMM PEMVCAGVSP GTAIISXPTE QTAVIASSLS 

51 NVSTPASAAA IIPSSSXTGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 

101 PCVPQTLKPI SSRMRATESP TAGVGASDKS RIPNGIFSIF EASRPMSSPT 

151 RVILKAVFFT TSATSVNWA SEFSNAAFTT PGPDTPTLIT ASASPEP^NA 

201 PAIXGLSSXA LQNTTILAQP KPSSVISXVR LMVSPASLTA SILIPARVLP 

251 ILMELHTISV VFIASGMERX NTSSEGDIPF CTSAEKPPIK DTPMALAALS 

301 KVCATLT* 

It should be noted that this protein includes a stop codon at position 198. 



ORF24a and ORF24-1 show 96.4% identity in 307 aa overlap: 

10 20 30 40 50 60 

orf 24a . pep MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISXPTEQTAVIASSLSNVSTPASAAA 
I I I I I j I I I I I I I I I I II I I I I I I I II I I I I I I I I I I II I II I : II I I I : I I I I I I I I I 
orf 24-1 MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISKPTEQTAVMASSLSSVSTPASAAA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 24a. pep IIPSSSXTGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPISSRMRATESP 

I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 1 I I I I I I i I I I I I I I I I I I M I I I I II 
orf 24-1 1 1 PS SSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPI SSRMRATESP 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 24a . pep TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 
I I I I I I I I I I II II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 24-1 TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 24a. pep PGPDTPTLITASASPEPXNAPAIXGLSSXALQNTTILAQPKPSSVISXVRIjMVSPASLTA 
I I I I I I I II I I I I I I I I I I I I I I I I II : I I I I I I I I I I I I I I : I I I III I I I I I I I I 
orf 24-1 PGPDT PTLITASAS PEPXNAPAINGLSSTALQNTTI LAQPKPSGVI S AVRLTVSPASLTA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 24a. pep SILIPARVLPILMELHTISWFIASGMERXNTSSEGDIPFCTSAEKPPIKDTPMALAALS 
I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I : I I I I I I I I I I I I I I I I I 
orf 24-1 SILIPARVLPILMELHTISWFIASGMERINTSSEGDIPFCTNAEKPPIKDTPMALAALS 

250 260 270 280 290 300 
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or f 2 4a. pep KVCATLTX 
I I I I I I I I 

orf24-l KVCATLTX 



Homology with a predicted ORF from N. gonorrhoeae 

ORF24 shows 96.7% identity over a 121 aa overlap with a predicted ORF (ORF24ng) from 
N. gonorrhoeae: 

orf 24 . pep MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISKPTEQTAVMASSLSSVSTPASAAA 60 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I II I I I I I I I I : I I I I I I I 
orf24ng MRTAVVLLLIMPMAASSAMMPEMVCAGVSPGTAIMSKPTEQTAVMASSLSSVNTPASAAA 60 



orf 24. pep IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPIXSRMRATXSP 120 

I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II 

orf24ng 1 1 PS SSETGINAPLKPPTALEAIMPPFFTAS FSNAKAAWPCVPQTLKPI SSRMRATESP 120 

orf 24. pep TG 122 
I : 

orf24ng TAGVGASDKSRMPNGI FS I FEASRPMSSPTRVILKAVFFTTSATSVRLTASEFSSAALTT 180 



The complete length ORF24ng nucleotide sequence <SEQ ED 679> is: 



1 ATGCGCACGG CGGTGGTTTT GCTGTTGATC ATGCCGATGG CGGCTTCGTC 

51 GGCGATGATG CCGGAAATGG TGTGCGCGGG CGTGTCGCCG GGAACGGCAA 

101 TCATGTCCAA ACCAACGGAG CAGACGGCGG TCATGGCTTC GAGTTTGTCC 

151 AGCGTCAACA CGCCTGCCTC GGCGGCGGCA ATCATACCTT CGTCTTCGGA 

201 AACGGGGATA AACGCGCCGC TCAAACCGCC GACCGCGCTG GAAGCCATCA 

251 TGCCGCCCTT TTTCACGGCA TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 

301 CCGTGCGTAC CGCAGACGCT CAAGCCCATT TCTTCAAGAA TGCGCGCCAC 

351 CGAGTCGCCG ACGGCGGGGG TCGGTGCCAG CGACAAATCG AGAATGCCGA 

401 ACGGGATATT CAGCATTTTT GAGGCTTCGC GACCGATGAG TTCGCCCACG 

451 CGGGTGATTT TGAAAGCGGT TTTCTTCACG ACTTCGGCGA CCTCGGTCAG 

501 GCTGACCGCG TCCGAATTTT CCAGCGCGGC TTTGACCACG CCTGGACCGG 

551 ATACGCCGAC ATTAATCACA GCATCCGCTT CGCCCGAGCC GTGGAACGCA 

601 CCCGCCATAA ACGGATTGTC TTCCACCGCG TTGCAGAACA CGACGATTTT 

651 GGCGCAGCCG AAACCTTCGG GTGTGATTTC AGCCGTGCGT TTGATGGTTT 

701 CGCCTGCCAG CTTGACCGCA TCCATATTGA TACCGGCACG CGTGCTGCCG 

751 ATATTGATGG AGCTGCACAC GATATCGGTA GTTTTCATCG CTTCGGGAAC 

801 • GGAACGGATC AACACCTCAT CCGAAGGCGA CATACCTTTT TGCACCAGCG 

851 CGGAAAAGCC GCCGATAAAG GACACGCCGA TGGCTTTGGC TGCCTTGTCC 

901 AAAGTCTGCG CCACGCTGAC ATAA 

This encodes a protein having amino acid sequence <SEQ ID 680>: 



1 MRTAWLLLI MPMAASSAM M PEMVCAGVSP GTAIMSKPTE QTAVMASSLS 

51 SVNTPASAAA IIPSSSETGI NAPLKPPTAL EAIMPPFFTA S FSNAKAAW 

101 PCVPQTLKPI SSRMRATESP TAGVGASDKS RMPNGIFSIF EASRPMSSPT 

151 RVILKAVFFT TSATSVRLTA SEFSSAALTT PGPDTPTLIT ASASPEPWNA 

201 PAINGLSSTA LQNTTILAQP KPSGVIS AVR LMVSPASLTA SILI PARVLP 

251 ILMELHTISV VFIA SGTERI NTSSEGDIPF CTSAEKPPIK DTPMALAALS 

301 KVCATLT* 

ORF24ng and ORF24-1 show 96.1% identity in 307 aa overlap: 



10 20 30 40 50 60 

orf 24-1 . pep MRT AWLLLIMPMAAS S AMMPEMVCAGVS PGTAI I SKPTEQTAVMAS SLS S VST P ASAAA 
II I i I I I I I I I I I I I I I I I I I I I I I I I I I II I I I : I I I I I II I I I I II I II I : I I I I I I I 
orf24ng MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIMSKPTEQTAVMAS SLS SVNTPASAAA 

10 20 30 40 50 60 



70 80 90 100 110 120 

o r f 2 4 - 1 . pep 1 1 PS S SETGIN APLKPPTALEAIMPPFFTAS FSNAKAAWPCVPQTLKPI S SRMRATES P 
I | I I I I I I I I I I I I 1 I I I I I I I I I II II I I M I I I I I I I I 1 I M I I I I I I I I I M I I I I I 
orf24ng HPS SSETGINAPLKPPTALEAIMPPFFTAS FSNAKAAWPCVPQTLKPI SSRMRATESP 

70 80 90 100 110 120 
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130 140 150 160 170 180 

orf 24-1. pep T AG VGAS DKS R I PNG I FS I FE AS R PM S S PT RV I LKAV FFTT S AT S VN WAS E FSN AAFTT 
I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I : : I I I I I : I I : I I 
orf24ng TAGVGASDKSRMPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVRLTASEFSSAALTT 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 24-1 . pep ' PGPDTPTLITASAS PEPXNAPAINGLS STALQNTTI LAQPKPSGVI SAVRLTVS PASLTA 

I I I I I I I I I I I 1 I I I ! I I I 1 E I i I J I 1 I I t I I I I 1 E I 1 I J I 1 1 1 I I I I I I MINIM 
orf24ng PGPDTPTLITASAS PEPWNAPAINGLS STALQNTTI LAQPKPSGVI SAVRLMVS PASLTA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 24-1. pep SILIPARVLPILMELHTISWFIASGMERINTSSEGDIPFCTNAEKPPIKDTPMALAALS 

II II I I I II I I I I I I I II M I II II I II I I I I M I I II I II :\ I II I I I M I II II I II 
orf24ng SILIPARVLPILMELHTISWFIASGTERINTSSEGDIPFCTSAEKPPIKDTPMALAALS 

250 260 270 280 290 300 

orf24-l.pep KVCATLTX 
I I I I M I I 

orf24ng KVCATLTX 

Based on this analysis, including the presence of a putative leader sequence (first 18 aa - double- 
underlined) and putative transmembrane domains (single-underlined) in the gonococcal protein, 
it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 81 

The following partial DNA sequence was identified in AT. meningitidis <SEQ ID 68 1>: 

1 . . ACCGACGTGC AAAAAGAGTT GGTCGGCGAA CAACGCAAGT GGGCGCAGGA 

51 AAAAATCAGC AACTGCCGAC AAGCCGCCGC GCAGGCAGAC CGGCAGGAAT 

101 ACGCCGAATA CCTCAAGCTG CAATGCGACA CGCGGATGAC GCGCGAACGG 

151 ATACAGTATC TTCGCGGCTA TTCCATCGAT TAG 

This corresponds to the amino acid sequence <SEQ ID 682; ORF25>: 

1 . .TDVQKELVGE QRKWAQEKIS NCRQAAAQAD RQEYAEYLKL QCDTRMTRER 
51 IQYLRGYSID * 

Further work revealed the complete nucleotide sequence <SEQ ID 683>: 

1 ATGTATCGGA AACTCATTGC GCTGCCGTTT GCCCTGCTGC TTGCCGCTTG 

51 CGGCAGGGAA GAACCGCCCA AGGCATTGGA ATGCGCCAAC CCCGCCGTGT 

101 TGCAAGGCAT ACGCGGCAAT ATTCAGGAAA CGCTCACGCA GGAAGCGCGT 

151 TCTTTCGCGC GCGAAGACGG CAGGCAGTTT GTCGATGCCG ACAAAATTAT 

201 CGCCGCCGCC TACGGTTTGG CGTTTTCTTT GGAACACGCT TCGGAAACGC 

251 AGGAAGGCGG GCGCACGTTC TGTATCGCCG ATTTGAACAT TACCGTGCCG 

301 TCTGAAACGC TTGCCGATGC CAAGGCAAAC AGCCCCCTGT TGTACGGGGA 

351 AACTGCTTTG TCGGATATTG TGCGGCAGAA GACGGGCGGC AATGTCGAGT 

401 TTAAAGACGG CGTATTGACG GCAGCCGTCC GCTTCCTGCC CGTCAAAGAC 

451 GGTCAGACGG CATTTGTCGA CAACACGGTC GGTATGGCGG CGCAAACGCT 

501 GTCTGCCGCG CTGCTGCCTT ACGGCGTGAA GAGCATCGTG ATGATAGACG 

551 GCAAGGCGGT GAAAAAAGAA GACGCGGTCA GGATTTTGAG CGGAAAAGCC 

601 CGTGAAGAAG AACCGTCCAA ACCCACGCCC GAAGACATTT TGGAACACAA 

651 TGCCGCCGGC GGCGATGCGG GCGTACCCCA AGCCGCAGAA GGCGCGCCCG 

701 AACCGGAAAT CCTGCATCCT GACGACGGCG AGCGTGCCGA TACCGTTACC 

751 GTATCACGGG GCGAAGTGGA AGAGGCGCGC GTACAAAACC AGCGTGCGGA 

801 ATCCGAAATT ACCAAACTTT GGGGAGGACT CGATACCGAC GTGCAAAAAG 

851 AGTTGGTCGG CGAACAACGC AAGTGGGCGC AGGAAAAAAT CAGCAACTGC 

901 CGACAAGCCG CCGCGCAGGC AGACCGGCAG GAATACGCCG AATACCTCAA 

951 GCTGCAATGC GACACGCGGA TGACGCGCGA ACGGATACAG TATCTTCGCG 

1001 GCTATTCCAT CGATTAG 
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This corresponds to the amino acid sequence <SEQ ID 684; ORF25-l>: 

1 MYRKLIALPF ALLLAA CGRE EPPKALECAN PAVLQGIRGN IQETLTQEAR 

51 SFAREDGRQF VDADKIIAAA YGLAFSLEHA SETQEGGRTF CIADLNITVP 

101 SETLADAKAN SPLLYGETAL SDIVRQKTGG NVEFKDGVLT AAVRFLPVKD 

151 GQTAFVDNTV GMAAQTLSAA LLPYGVKSIV MIDGKAVKKE DAVRILSGKA 

201 REEEPSKPTP EDILEHNAAG GDAGVPQAAE GAPE PE I LHP DDGERADTVT 

251 VSRGEVEEAR VQNQRAESEI TKLWGGLDTD VQKELVGEQR KWAQEKISNC 

301 RQAAAQADRQ EYAEYLKLQC DTRMTRERIQ YLRGYSID* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from meningitidis (strain A) 

ORF25 shows 98.3% identity over a 60aa overlap with an ORF (ORF25a) from strain A of N. 
meningitidis: 

10 20 30 

orf25 pep TDVQKELVGEQRKWAQEKI SNCRQAAAQAD 

llliill III M M I I I I M I I I M I I I I 
orf25a VTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEXRKWAQEKI SNCRQAAAQAD 

250 260 270 280 290 300 

40 50 60 

or f 25 . pep RQE YAEYLKLQCDTRMTRERI QYLRGYS I DX 
I I t I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I 
orf25a RQE YAEYLKLQCDTRMTRERI QYLRGYS I DX 

310 320 330 

The complete length ORF25a nucleotide sequence <SEQ ID 685> is: 

1 ATGTATCGGA AACTCATTGC GCTGCCGTTT GCCCTGCTGC TTGCCGCTTG 

51 CGGCAGGGAA GAACCGCCCA AGGCATTGGA ATGCGCCAAC CCCGCCGTGT 

101 TGCAANGCAT ACGCNGCAAT ATTCAGGAAA CGCTCACGCA GGAAGCGCGT 

151 TCTTTCGCGC GCGAAGACNG CANGCAGTTT GTCGATGCCG ACNAAATTAT 

201 CGCCGCCGCC TANGNTNNGN NGNTNTCTTT GGAACACGCT TCGGAAACGC 

251 AGGAAGGCGG GCGCACGTTC TGTNTCGCCG ATTTGAACAT TACCGTGCCG 

301 TCTGAAACGC TTGCCGATGC CAAGGCAAAC AGCCCCCTGC TGTACGGGGA 

351 AACCGCTTTG TCGGATATTG TGCGGCAGAA GACGGGCGGC AATGTCGAGT 

401 TTAAAGACGG CGTATTGACG GCAGCCGTCC GCTTCCTACC CGTCAAAGAC 

451 GGTCAGANGG CATTTGTCGA CAACACGGTC GGTATGGCGG CGCAAACGCT 

501 GTCTGCCGCG TTGCTGCCTT ACGGCGTGAA GAGCATCGTG ATGATAGACG 

551 GCAAGGCGGT AAAAAAAGAA GACGCGGTCA GGATTNTGAG CNGANAAGCC 

601 CGTGAANAAG AACCGTCCAA ANCCNNGCCC GAAGACATTT TGGAACATAA 

651 TGCCGCCGGA GGGGATGCAG ACGTACCCCA AGCCGGAGAA GACGCGCCCG 

701 AACCGGAAAT CCTGCATCCT GACGACGGCG AGCGTGCCGA TACCGTTACC 

751 GTATCACGGG GCGAAGTGGA AGAGGCGCGN GTACAAAACC AGCGTGCGGA 

801 ATCCGAAATT ACCAAACTTT GGGGAGGACT CGATACCGAC GTGCAAAAAG 

851 AGTTGGTCGG CGAANAACGC AAGTGGGCGC AGGAAAAAAT CAGCAACTGC 

901 CGACAAGCCG CCGCGCAGGC AGACCGGCAG GAATACGCCG AATACCTCAA 

951 GCTGCAATGC GACACGCGGA TGACGCGCGA ACGGATACAG TATCTTCGCG 

1001 GCTATTCCAT CGATTAG 

This encodes a protein having amino acid sequence <SEQ ID 686>: 



1 MYRKLIALPF ALLLAA CGRE EPPKALECAN PAVLQXIRXN IQETLTQEAR 

51 SFAREDXXQF VDADXIIAAA XXXXXSLEHA SETQEGGRTF CXADLNITVP 

101 SETLADAKAN SPLLYGETAL SDIVRQKTGG NVEFKDGVLT AAVRFLPVKD 

151 GQXAFVDNTV GMAAQTLSAA LLPYGVKSIV MIDGKAVKKE DAVRIXSXXA 

201 REXEPSKXXP EDILEHNAAG GDADVPQAGE DAPEPEILHP DDGERADTVT 

251 VSRGEVEEAR VQNQRAESEI TKLWGGLDTD VQKELVGEXR KWAQEKISNC 

301 RQAAAQADRQ EYAEYLKLQC DTRMTRERIQ YLRGYSID* 

ORF25a and ORF25-1 show 93.5% identity in 338 aa overlap: 



10 20 30 40 50 60 

or f 25a . pep MYRKLIALPFALLLAACGREEPPKALECANPAVLQXIRXNIQETLTQEARSFAREDXXQF 
I I I I I II I I II II I II I I III MM I I II I Ml M M I I ! I I I I I I N II I I II II 
orf25-l MYRKLIALPFALLLAACGREEPPKALECAN PAVLQGIRGN IQETLTQEARS FARE DGRQF 
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10 



20 



30 



40 



50 



60 



70 80 90 100 110 120 

orf 25a . pep VDADXIIAAAXXXXXSLEHASETQEGGRTFCXADLNITVPSETLADAKANSPLLYGETAL 
Nil ! ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 

orf 25-1 VDADKIIAAAYGLAFSLEHASETQEGGRTFCIADLNITVPSETLADAKANSPLLYGETAL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 25a . pep SDIVRQKTGGNVEFKDGVLTAAVRFLPVKDGQXAFVDNTVGMAAQTLSAALLPYGVKSIV 
I I I I I I I I I I I I I I I I I I I I I I 1 I i I I I I I I I : I I I I I I I I I I t I I I I I I i I I I I I I I I I 
orf 2 5-1 SDIVRQKTGGNVEFKDGVLTAAVRFLPVKDGQTAFVDNTVGMAAQTLSAALLPYGVKSIV 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 25a. pep MIDGKAVKKEDAVRIXSXXAREXEPSKXXPEDILEHNAAGGDADVPQAGEDAPEPEILHP 
I I I I I I I I I I I II I I I III Mil : I I I I I I I I I I I I I I i I I I : I I I I I I i I I I 
orf 25-1 MIDGKAVKKEDAVRILSGKAREEEPSKPTPEDILEHNAAGGDAGVPQAAEGAPEPEILHP 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 25a . pep DDGERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEXRKWAQEKISNC 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 
orf 25-1 DDGERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNC 

250 260 270 280 290 300 

310 320 330 339 

orf 25a . pep RQAAAQADRQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 
orf 25-1 RQAAAQADRQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 

310 320 330 

Homology with a predicted ORF from K gonorrhoeae 

ORF25 shows 100% identity over a 60aa overlap with a predicted ORF (ORF25ng) from 
N. gonorrhoeae: 

orf 25 . pep TDVQKELVGEQRKWAQEKI SNCRQAAAQAD 30 

I I I I I I I I I I I I I I I I I I I I 1 I I I I I I II I 
orf25ng VT V S RGE VEEARVQNQRAE S E I TKLWGG LDT DVQKE LVGEQRKWAQE KI SNCRQAAAQAD 308 

orf 25. pep RQEYAEYLKLQCDTRMTRERIQYLRGYSID 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
o r f 2 5ng RQE YAE YLKLQCDTRMTRERIQYLRGYS I D 338 

The complete length ORF25ng nucleotide sequence <SEQ ED 687> is: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001. 



ATGTATCGGA 
CGGCAGGGAA 
TGCAGGACAT 
TCTTTCGCGC 
CGCCGCCGCC 
AGGAAGGCGG 
TCTGAAACGC 
AACGTCTTTG 
TTAAAGACGG 
GCTCGGACGG 
GTCTGCCGCG 
GCAAGGCGGT 
CGTGAAGAAG 
TGCCGCCGGC 
AACCCGAAAT 
GTATCACGGG 
ATCCGAAATT 
AGTTGGTCGG 
cgACAAGCCG 
GCTCCAATGC 
GCTATTCCAT 



AACTCATTGC 
GAACCGCCCA 
ACGCGGCAGT 
GCGAAGACGG 
TACGGTTTGG 
GCGCACGTTC 
TTGCCGATGC 
GCAGACATCG 
CGTATTGACG 
CATTTATCGA 
TTGCTGCCTT 
GACAAAAGAA 
AACCGTCCAA 
GGCGATGCGG 
CCTGCATCCC 
GCGAAGTGGA 
ACCAAACTTT 
CGAACAGCGC 
CCGCGCAGGC 
GACACGCGGA 
CGATTAG 



GCTGCCGTTT 
AGGCGTTGGA 
ATTCAGGAAA 
CAGGCAGTTT 
CGTTTTCTTT 
TGTATCGCCG 
CGAGGCAAAC 
TGCAGCAGAA 
GCAGCCGTCC 
CAACACGGTC 
ACGGCGTGAA 
GACGCGGTCA 
ACCCACCCCC 
GCGTACCCCA 
GACGACGTCG 
AGAGGCGCGC 
GGGGAGGACT 
AAGTGGGCGC 
AGACCGGCAG 
TGACGCGCGA 



GCCCTGCTGC 
ATGCGCCAAC 
CGCTCACGCA 
GTCGATGCCG 
GGAACACGCT 
ATTTGAACAT 
AGCCCCCTGC 
GACGGGCGGC 
GCTTCCTGCC 
GGTATGGCGA 
GAGCATCGTG 
GGGTTTTGAG 
GAAGACATTT 
AGCCGCAGAA 
AGCGTGCCGA 
GTACAAAACC 
CGATACCGAC 
AGGAAAAAAT 
GAATACGCCG 
ACggaTACAG 



TTGCAGCGTG 
CCCGCCGTGT 
GGAAGCGCGT 
ACAAAATTAT 
TCGGAAACGC 
TACCGTGCCG 
TGTATGGGGA 
AATGTCGAGT 
CGCCAAAGAC 
CGCAAACGCT 
ATGATAGACG 
CGGCAAAGCC 
TGGAACACAA 
GGCGCACCCG 
TACCGTTACC 
AACGTGCGGA 
GTGCAAAAAG 
CAGcaactgc 
AATACCTCAA 
TATCTTCGCG 



This encodes a protein having amino acid sequence <SEQ ID 688>: 
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1 MYRKLIALPF ALLLAA CGRE EPPKALECAN PAVLQDIRGS IQETLTQEAR 

51 SFAREDGRQF VDADKIIAAA YGLAFSLEHA SETQEGGRTF CIADLNITVP 

101 SETLADAEAN SPLLYGETSL ADIVQQKTGG NVEFKDGVLT AAVRFLPAKD 

151 ARTAFIDNTV GMATQTLSAA LLPYGVKSIV MIDGKAVTKE DAVRVLSGKA 

5 201 REEEPSKPTP EDILEHNAAG GDAGVPQAAE GAPEPEILHP DDVERADTVT 

251 VSRGEVEEAR VQNQRAESEI TKLWGGLDTD VQKELVGEQR KWAQEKISNC 

301 RQAAAQADRQ EYAEYLKLQC DTRMTRERIQ YLRGYSID* 

ORF25ng and ORF25-1 show 95.9% identity in 338 aa overlap: 

10 20 30 40 50 60 

10 orf 25-1 . pep MYRKLIALPFALLLAACGREEPPKALECANPAVLQGIRGNIQETLTQEARSFAREDGRQF 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I 
orf25ng MYRKLIALPFALLLAACGREEPPKALECANPAVLQDIRGSIQETLTQEARSFAREDGRQF 

10 20 30 40 50 60 

15 70 80 90 100 110 120 

orf 25-1. pep VDADKIIAAAYGLAFSLEHASETQEGGRTFCIADLNITVPSETLADAKANSPLLYGETAL 
I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I : I 
orf25ng VDADKI I AAA YGLAFSLEHASETQEGGRTFC I ADLN I TVPSETLADAEAN SPLLYGETSL 

70 80 90 100 110 120 

20 

130 140 150 160 170 180 

orf 25-1 . pep SDIVRQKTGGNVEFKDGVLTAAVRFLPVKDGQTAFVDNTVGMAAQTLSAALLPYGVKSIV 
: I I i : I I I I i I I I I I I I I I I I I ) I I I I : I I :: I I I : I I I I I I I : i I I I I I I I I I I I I I I I 
orf25ng ADIVQQKTGGNVEFKDGVLTAAVRFLPAKDARTAFIDNTVGMATQTLSAALLPYGVKSIV 
25 130 140 150 160 170 180 

190 200 210 220 230 240 

orf 25-1. pep MIDGKAVKKEDAVRILSGKAREEEPSKPTPEDILEHNAAGGDAGVPQAAEGAPEPEILHP 
liillli I I I I I I : I I M I I I I I I M I I I I I I I M I I I I I II I I I I I I I I I I I I I I I I I 
30 orf25ng MIDGKAVTKEDAVRVLSGKAREEEPSKPTPEDILEHNAAGGDAGVPQAAEGAPEPEILHP 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 25-1 . pep DDGERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNC 
35 * " | I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II 

orf25ng DDVERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNC 

250 260 270 280 290 300 

310 320 330 339 

40 orf 25-1 . pep RQAAAQADRQE YAEYLKLQCDTRMTRERIQYLRGYS I DX 

Mill Ml III MM I Mill MM II I III II MM! I 
or f 2 5ng RQAAAQADRQE YAEYLKLQCDTRMTRERIQYLRGYS I DX 

310 320 330 

45 Based on this analysis, including the presence of a predicted prokaryotic membrane lipoprotein 
lipid attchment site (underlined) in the gonococcal protein, it was predicted that the proteins from 
N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 

ORF25-1 (37kDa) was cloned in pET and pGex vectors and expressed in E.colU as described 
50 above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
16A shows the results of affinity purification of the GST-fusion protein, and Figure 16B shows the 
results of expression of the His-fusion in E.colu Purified His-fusion protein was used to immunise 
mice, whose sera were used for Western blot (Figure 16C), ELISA (positive result), and FACS 
analysis (Figure 16D). These experiments confirm that ORF25-1 is a surface-exposed protein, and 



55 that it is a useful immunogen. 
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Figure 16E shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF25-1. 



Example 82 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 689> 

1 ATGCAGCTGA TCGACTATTC ACATTCATTT TTCTCGGTTG TGCCACCCTT 

51 TTTGGCACTG GCACTTGCCG TCATTACCCG CCGCGTACTG CTGTCTTTAG 

101 GCATCGGTAT TCTGGwysGC GTTGCCTTTT TGGTCGGCGG CAACCCCGTC 

151 GACGGTCTGA CACACCTGAA AGACATGGTC GTCGGCTTGG CTTGGTCAGA 

201 CGsyGATTGG. TCGCTGGGCA AACCAAAAAT CTTGGTTTTC CkGATACTTT 

251 TGGGTATTTT TACTTCCCTG CTGACCTACT CCGGCAGCAA T 

// 

851 AC TTCGCTGGTA 

901 TTCGGCGGCA CTTGCGGCGT CTTTGCCGTC GTTCTCTGCA CGCTCGGCAC 

951 GATTAAAACC GCCGACTATC CCAAAGCCGT TTGGCAGGGT GCGAAATCTA 

1001 TGTTCGGCGC AATCGCCATT TTAATCCTCG CTTGGCTCAT CAGTACGGTT 

1051 GTCGGCGAAA TGCACACCGG CGATTACCTC TCCACACTGG TTGCGGGCAA 

1101 CATCCATCCC GGCTTCCTGC CCGTCATCCT CTTCCTGCTC GCCAGCGTGA 

1151 TGGCGTTTGC CACAGGCACA AGCTGGGGGA CGTTCGGCAT TATGCTGCCG 

1201 ATTGCCGCCG CCATGGCGGT CAAAGTCGAA CCCGCGCTGA TTATCCCGTG 

1251 TATGTCCGCA GTAATGGCGG GGGCGGTATG CGGCGACCAC TGCTCGCCCA 

1301 TTTCCGACAC GACCATCCTG TCGTCCACCG GCGCGCGCTG CAACCACATC 

1351 GACCACGTTA CCTCGCAACT GCCTTACGCC TTAACCGTTG CCGCCGCCGC 

1401 CGCATCGGGC TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGCT 

1451 TTGGCACGAC AGGCATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAA. . 

This corresponds to the amino acid sequence <SEQ ID 690; ORF26>: 

1 MQLIDYSHSF FSWPPFLAL ALAVITRRVL LSLGIGILXX VAFLVGGNPV 

51 DGLTHLKDMV VGLAWSDXDW SLGKPKILVF XILLGIFTSL LTYSGSN . . . 

// 

251 TSLV 

301 FGGTCGVFAV VLCTLGTIKT ADYPKAVWQG AKSMFGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLPVILFLL ASVMAFATGT SWGTFGIMLP 

401 IAAAMAVKVE PALIIPCMSA VMAGAVCGDH CSPISDTTIL SSTGARCNHI 

451 DHVTSQLPYA LTVAAAAASG YLALGLTKSA LLGFGTTGIV LAVLIFLLKD 

501 KK. . 

Further work revealed the complete nucleotide sequence <SEQ ID 691>: 

1 ATGCAGCTGA TCGACTATTC ACATTCATTT TTCTCGGTTG TGCCACCCTT 

51 TTTGGCACTG GCACTTGCCG TCATTACCCG CCGCGTACTG CTGTCTTTAG 

101 GCATCGGTAT TCTGGTCGGC GTTGCCTTTT TGGTCGGCGG CAACCCCGTC 

151 GACGGTCTGA CACACCTGAA AGACATGGTC GTCGGCTTGG CTTGGTCAGA 

201 CGGCGATTGG TCGCTGGGCA AACCAAAAAT CTTGGTTTTC CTGATACTTT 

251 TGGGTATTTT TACTTCCCTG CTGACCTACT CCGGCAGCAA TCAGGCGTTT 

301 GCCGACTGGG CAAAACGGCA CATTAAAAAC CGGCGCGGCG CGAAAATGCT 

351 GACCGCCTGC CTCGTGTTCG TAACCTTTAT CGACGACTAT TTCCACAGTC 

401 TCGCCGTCGG TGCGATTGCC CGCCCCGTTA CCGACAAGTT TAAAGTTTCC 

451 CGCACCAAAC TCGCCTACAT CCTCGACTCC ACTGCCGCTC CTATGTGCGT 

501 GCTGATGCCC GTTTCAAGCT GGGGCGCGTC GATTATCGCC ACGCTTGCCG 

• 551 GACTGCTCGT TACCTACAAA ATCACCGAAT ACACGCCGAT GGGGACGTTT 

601 GTCGCCATGA GCCTGATGAA CTATTACGCA CTGTTTGCCC TGATTATGGT 

651 GTTCGTCGTC GCATGGTTTT CCTTCGACAT CGGCTCGATG GCACGTTTCG 

701 AACAAGCCGC GTTGAACGAA GCCCACGATG AAACTGCCGT TTCAGACGCT 

751 ACCAAAGGTC GTGTTTACGC ACTGATTATT CCCGTTTTGG CCTTAATCGC 

801 CTCAACGGTT TCCGCCATGA TCTACACCGG CGCGCAGGCA AGCGAAACCT 

851 TCAGCATTTT GGGGGCATTT GAAAACACGG ACGTAAACAC TTCGCTGGTA 

901 TTCGGCGGCA CTTGCGGCGT CCTTGCCGTC GTTCTCTGCA CGCTCGGCAC 

951 GATTAAAACC GCCGACTATC CCAAAGCCGT TTGGCAGGGT GCGAAATCTA 

1001 TGTTCGGCGC AATCGCCATT TTAATCCTCG CTTGGCTCAT CAGTACGGTT 

1051 GTCGGCGAAA TGCACACCGG CGATTACCTC TCCACACTGG TTGCGGGCAA 

1101 CATCCATCCC GGCTTCCTGC CCGTCATCCT CTTCCTGCTC GCCAGCGTGA 

1151 TGGCGTTTGC CACAGGCACA AGCTGGGGGA CGTTCGGCAT TATGCTGCCG 

1201 ATTGCCGCCG CCATGGCGGT CAAAGTCGAA CCCGCGCTGA TTATCCCGTG 

1251 TATGTCCGCA GTAATGGCGG GGGCGGTATG CGGCGACCAC TGCTCGCCCA 

1301 TTTCCGACAC GACCATCCTG TCGTCCACCG GCGCGCGCTG . CAACCACATC 
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1351 GACCACGTTA CCTCGCAACT GCCTTACGCC TTAACCGTTG CCGCCGCCGC 

1401 CGCATCGGGC TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGCT 

1451 TTGGCACGAC AGGCATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAACGCG CCAACGCCTG A 

5 This corresponds to the amino acid sequence <SEQ ID 692; ORF26-l>: 

1 MQLIDYSHSF FSWPPFLAL A LAVITRR VL LSLGIGILVG VAFLV GGNPV 

51 DGLTHLKDMV VGLAWSDGDW SLGKP KILVF LILLGIFTSL LTY SGSNQAF 

101 ADWAKRHIKN R RGAKMLTAC LVFVTFID DY FHSLAVGAIA RPVTDKFKVS 

151 RTKLAYILDS TAAPMCVLMP VSSWGASIIA TLAGLLV TYK ITEYTPMGTF 

10 201 VAMSLMNYYA LFALIMVFW AWFSFDI GSM ARFEQAALNE AHDETAVSDA 

251 TKGRVY ALII PVLALIASTV SAMI YTGAQA SETFSiLGAF ENTDVNTSLV 

301 FGGTCGVLAV VLCTL GTIKT ADYPKAVWQG AKS MFGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLPVILFLL ASVMAFA TGT SW GTFGIMLP 

401 IAAAMAVKV E P ALIIPCMSA VMAGAVCG DH CSPISDTTIL SSTGARCNHI 

15 451 DHVTSQLPY A LTVAAAAASG YLALGL TKSA LLGFGTTGIV LAVLIFL LKD 

501 KKRANA* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the hypothetical transmembrane protein HI1 586 of H. influenzae (accession number P44263) 
ORF26 and HI1586 show 53% and 49% amino acid identity in 97 and 221 aa overlap at the 
20 N-terminus and C-terminus, respectively: 

Orf26 1 MQL I D Y SH S FFS W P P FLALALAV I TRRVXXXXXXXXXXX V AFL VGGN PV DGLT HLKDMV 60 

M+LID+S S +S+VP LA+ LA+ TRRV L +L V 

HI1586 14 MELIDFSSSVWSIVPALLAIILAIATRRVLVSLSAGIIIGSLMLSDWQIGSAFNYLVKNV 73 

25 Orf26 61 VGLAWSDXDWSLGKPKILVFXILLGIFTSLLTYSGSN 97 

V L ++D + + I++F +LLG+ T+LLT SGSN 

HI1586 74 VSLVYADGEIN-SNMNIVLFLLLLGVLTALLTVSGSN 109 



30 



// 

Orf26 86 IFTSLLTYSGS — NTSLVFGGTCGVFAWLCTL — GTIKTADYPKAVWQGAKSMFGXXXX 141 

+F+ L T+ + TSLV GG C + L + + +Y ++ G KSM G 
HI1586 299 VFSVLGTFENTWGTSLWGGFCSIIISTLLIILDRQVSVPEYVRSWIVGIKSMSGAIAI 358 



35 Orf2 6 142 XXXXXXXSTWGEMHTGDYLSTLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLP 201 

+ +VG+M TG YLS+LV+GNI FLPVILF+L + MAF+TGTSWGTFGIMLP 
HI1586 359 LFFAWTINKIVGDMQTGKYLSSLVSGNIPMQFLPVILFVLGAAMAFSTGTSWGTFGIMLP 418 

Orf26 202 IAAAMAVKVE PAL 1 1 PCMS AVMAGAVCGDHCS PI SDTT I LS STGARCNHI DHVTSQXXXX 261 
40 IAAAMA P L++PC+SAVMAGAVCGDHCSP+SDTTILSSTGA+CNHIDHVT+Q 

HI1586 419 IAAAMAANAAPELLLPCLSAVMAGAVCGDHCSPVSDTTILSSTGAKCNHIDHVTTQLPYA 478 

0rf2 6 262 XXXXXXXXXXXXXXXXXKSALLGFGTTGIVLAVLIFLLKDK 302 

S L GF T + L V+IF +K + 
45 HI1586 479 AT VAT AT S I G Y IWG FT Y S GLAG FAAT AVS L I VI I FAVKKR 519 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF26 shows 58.2% identity over a 502aa overlap with an ORF (ORF26a) from strain A oiK 
meningitidis: 

50 10 20 30 40 50 60 

orf 26 . pep MQLIDYSHSFFSWPPFLALA LAVITRR VLLSLGIGILXXVAFLV GGNPVDGLTHLKDMV 
II MM Mill III INI I I I I I I I I I I I I I I M I M I I I M M I M I M I I M M I 
or f 2 6a MQLIDYSHSFFSWPPFIALA LAVITRR VLLSLGIGILVGVAFLV GGNPVDGLTHLKDMV 

10 20 30 40 50 60 

55 

70 80 90 99 

orf 2 6. pep VGIAWSDXDWSLGKPK ILVFXILLGIFTSLLTY SGSNXX 

I I M I I I M II I M I Ml I I I I I I I I I I M I M I 
or f 2 6a VGLAWSDGDWSLGKP KXLVFLILLGIFTSLLTY SGSNQAFADWAKRHIKN RRGAKMLTAC 
60 70 80 90 100 110 120 
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or f 2 6. pep 
orf26a 



LVFVTFIDDYFHSLAVGAXARPVTDKFKVSRAKLAYILDSTAAPMCVLMP VSSWGASIIA 

150 160 170 180 



130 



140 



10 



15 



20 



25 



30 



35 



40 



or f 2 6. pep 
orf26a 

orf26.pep 
orf26a 

orf26.pep 
orf26a 

orf26.pep 
orf26a 

orf26.pep 
orf26a 

orf26.pep 
orf26a 



TLAGLLV TYKITEYTPMGTFVAMSLMNYY ALFALIMVFWAWFSFDI GSMARFEQAALNE 
190 200 210 220 230 240 

100 110 

TSLV 

I I I I 

AHDETAVSDGSWGRVY ALI I PVLALIASTVSAMI YTGAQASETFS ILGAFENTDVNTSLV 
250 260 270 280 290 300 

120 130 140 150 160 170 

FGGTCGVFAWLCTL GTIKTADYPKAVWQGAKS MFGAIAILILAWLISTW GEMHTGDYL 
I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
FGGTCGVLAWLCTL GT IKIADYPKAVWQGAKSM FGAIAILILAWLISTW GEMHTGDYL 
310 320 330 340 350 360 

180 190 200 210 220 230 

STLVAGNIHP GFLPVILFLLASVMAFA TGTSW GTFGIMLPIAAAMAVKV EP ALIIPCMSA 

I I I I I M I 1 I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I j I I I II : I : I M I I f I I 
S T LVAGN I H P GFLXVILFLLASVMAFA TGT SW GTFGIMLPIAAAMAVKV DP SLIIPCMSA 

~~~ - • - 420 



370 



380 



390 



400 



410 



240 250 260 270 280 290 

VMAGAVCG DHCSPISDTTILSSTGARCNHIDHVTSQLP YALTVAAAAASGYLALGL TKSA 
I I I I I I I I I I I I I I I I II I 1 I I I I II I I II M I I I I I I I I I I II I I I I i I I I I I I I I I I I 
VMAGAVCG DHCS PI S DTTILSSTGARCNHI DHVTSQLPY ALTVAAAAASGYLALGL TKS A 
430 440 450 460 470 480 

300 310 
LLG FGTTG I VLAVL I FL LKDKK 
I I I I I : I I ! I I I I I I I I I I I I I 
LLG FGXTG I VLAVL I FL LKDKKRANAX 
490 500 



The complete length ORF26a nucleotide sequence <SEQ ID 693> is: 



45 



50 



55 



60 



65 



70 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 



ATGCAGCTGA 
TTTGGCACTG 
GCATCGGTAT 
GACGGTCTGA 
CGGCGATTGG 
TGGGTATTTT 
GCCGACTGGG 
GACCGCCTGC 
TCGCCGTCGG 
CGCGCCAAAC 
GCTGATGCCC 
GACTGCTCGT 
GTCGCCATGA 
GTTCGTCGTC 
AACAAGCCGC 
AGCTGGGGCA 
CTCAACGGTT 
TCAGCATTTT 
TTCGGCGGCA 
GATTAAAATC 
TGTTCGGCGC 
GTCGGCGAAA 
CATCCATCCC 
TGGCGTTTGC 
ATTGCCGCCG 
TATGTCCGCC 
TTTCCGACAC 



TCGACTATTC 
GCACTTGCCG 
TCTGGTCGGC 
CACACCTGAA 
TCGCTGGGCA 
TACTTCCCTG 
CAAAACGGCA 
CTCGTGTTCG 
TGCGNTTGCC 
TCGCCTACAT 
GTTTCAAGCT 
TACCTACAAA 
GCCTGATGAA 
GCATGGTTCT 
GTTGAACGAA 
GGGTTTACGC 
TCCGCCATGA 
GGGTGCATTT 
CTTGCGGCGT 
GCCGATTATC 
AATCGCCATT 
TGCACACAGG 
GGCTTCCTGN 
CACAGGCACA 
CCATGGCGGT 
GTGATGGCGG 
GACCATCCTG 



ACATTCATTT 
TCATTACCCG 
GTTGCCTTTT 
AGACATGGTC 
AACCAAAANT 
CTGACCTACT 
CATTAAAAAC 
TAACCTTTAT 
CGCCCCGTTA 
CCTCGACTCC 
GGGGCGCGTC 
ATCACCGAAT 
CTATTACGCA 
CCTTCGACAT 
GCCCACGATG 
ATTGATTATT 
TCTACACCGG 
GAAAATACGG 
GCTTGCCGTC 
CCAAAGCCGT 
TTAATCCTTG 
CGACTACCTC 
CCGTCATCCT 
AGCTGGGGGA 
CAAAGTCGAT 
GGGCGGTATG 
TCGTCCACCG 



TTCTCGGTTG 
CCGCGTACTG 
TGGTCGGCGG 
GTCGGCTTGG 
CTTGGTTTTC 
CCGGCAGCAA 
CGGCGCGGCG 
CGACGACTAT 
CCGACAAGTT 
ACTGCCGCGC 
GATTATCGCC 
ACACGCCGAT 
CTGTTTGCCC 
CGGCTCGATG 
AAACTGCCGT 
CCCGTTTTGG 
TGCACAGGCA 
ACGTGAACAC 
GTCCTCTGCA 
TTGGCAGGGT 
CCTGGCTCAT 
TCCACGCTGG 
TTTCCTGCTC 
CGTTCGGCAT 
CCCTCACTGA 
CGGCGACCAC 
GCGCGCGCTG 



TGCCACCCTT 
CTGTCTTTAG 
CAACCCCGTC 
CTTGGTCAGA 
CTGATACTTT 
TCAGGCGTTT 
CGAAAATGCT 
TTCCACAGTC 
TAAAGTTTCC 
CTATGTGCGT 
ACGCTTGCCG 
GGGGACGTTT 
TGATTATGGT 
GCACGTTTCG 
TTCAGACGGC 
CCTTAATCGC 
AGCGAAACCT 
TTCGCTGGTA 
CGCTCGGCAC 
GCGAAATCCA 
CAGTACGGTT 
TTGCGGGCAA 
GCCAGCGTGA 
CATGCTGCCG 
TTATCCCGTG 
TGCTCGCCCA 
CAACCACATC 
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1351 GACCACGTTA CNTCGCAACT GCCTTACGCC TTAACCGTTG CCGCCGCCGC 

1401 CGCATCGGGN TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGTT 

1451 TTGGCANGAC AGGCATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAACGCG CCAACGCCTG A 

This encodes a protein having amino acid sequence <SEQ ID 694>: 

1 MQLIDYSHSF FSWPPFLAL A LAVITRR VL LSLGIGILVG VAFLV GGNPV 

51 DGLTHLKDMV VGLAWSDGDW SLGKP KXLVF LILLGIFTSL LTY SGSNQAF 

101 ADWAKRHIKN RRGAKMLTAC LVFVTFID DY FHSLAVGAXA RPVTDKFKVS 

151 RAKLAYILDS TAAPMCVLMP VSSWGASIIA TLAGLLV TYK ITEYTPMGTF 

201 VAMSLMNYYA LFALIMVFW AWFSFDI GSM ARFEQAALNE AHDETAVSDG 

251 SWGRVY ALII PVLALIASTV SAMI YTGAQA SETFSILGAF ENTDVNTSLV 

301 FGGTCGVLAV VLCTL GTIKI ADYPKAVWQG AKSM FGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLXVILFLL ASVMAFA TGT SW GTFGIMLP 

401 IAAAMAVKV D P SLIIPCMSA VMAGAVCG DH CSPISDTTIL SSTGARCNHI 

451 DHVTSQLPY A LTVAAAAASG YLALGL TKSA LLGFGXTGIV LAVLIFL LKD 

501 KKRANA* 

ORF26a and ORF26-1 show 97.8% identity in 506 aa overlap: 

10 20 30 40 50 60 

orf 2 6a . pep MQLIDYSHSFFSWPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 

I I I I I I II I i I II I I I II I I I I I I I I I II I ! ! I I I I I I I I I I I I I I II I I I I I I I I I M I 
orf 26-1 MQLIDYSHSFFSWPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 2 6a. pep VGLAWSDGDWSLGKPKXLVFL I LLGIFTSLLTYSGSNQAFADWAKRH I KN RRGAKMLTAC 

I I I I I II I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
orf 26-1 VGLAWSDGDWSLGKPKILVFLILLGI FT SLLTYSGSNQAFADWAKRHIKN RRGAKMLTAC 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 26a . pep LVFVTFIDDYFHSLAVGAXARPVTDKFKVSRAKLAYILDSTAAPMCVLMPVSSWGASIIA 
I I I I I I I I I II I I I I I I I I I I I I I I I I I I I : I I II I I I I I I I I I I I I I I I I I I I I I I I I 
orf 26-1 LVFVTFIDDYFHSLAVGAIARPVTDKFKVSRTKLAYILDSTAAPMCVLMPVSSWGASIIA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 2 6a. pep TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFWAWFSFDIGSMARFEQAALNE 
I I I I I I I I M I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I 11 1 I I I I I I I I I 
orf 26-1 T LAGLL VT YK ITE YT PMGT FVAMS LMN Y YALFAL IMVFWAW FS FD I G SMARFEQAALNE 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 2 6a. pep AHDETAVSDGSWGRVYALIIPVLALIASTVSAMIYTGAQASETFSILGAFENTDVNTSLV 
I I I I I I I I I : : I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I 11 I I I I I I I I I I I I 1 I I I I 
orf 26-1 AHDETAVSDATKGRVYALIIPVLALIASTVSAMIYTGAQASETFSILGAFENTDVNTSLV 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 26a . pep FGGTCGVLAWLCTLGT I KI ADYPKAVWQGAKSMFGAI AI LI LAWL I STWGEMHTGDYL 

I I I I I I I I I 1 I 1 I I I I I I 1 I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 1 I I I I I I I I I 
orf 2 6-1 FGGTCGVLAWLCTLGT IKTADYPKAVWQGAKSMFGAIAILILAWL I STWGEMHTGDYL 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 2 6a. pep STLVAGN I HPGFLXVILFLLASVMAFATGTSWGTFGIMLP IAAAMAVKVD PSLIIPCMSA 

I 1 I 1 I II I I I I I I I I I I I I I I I I I 1 I I I I I I 1 I I I I I I I I I I : I : I I I I I I I I 

orf 26-1 STLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSA 

370 380 390 400 410 420 

430 440 450 460 470 . 480 

orf 26a . pep VMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQLPYALTVAAAAASGYLALGLTKSA 

I II II I Ml I II I MIIMIMII1 11 I I I I I I I I I I I II M I I I I M I I I II II 

orf 2 6-1 VMAGAVCGDHCS PI SDTT I LSSTGARCNHIDHVTSQLPYALTVAAAAASG YLALGLTKSA 

430 440 450 460 470 480 



orf 2 6a. pep 



490 500 
LLG FGXTG I VLAVL I FLLKDKKRAN AX 
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IN I 1:1 I I I I I II MM II II H M I 
orf26-l LLGFGTTGIVLAVLIFLLKDKKRANAX 

490 500 

Homology with a predicted ORF from N gonorrhoeae 

ORF26 shows 94.8% and 99% identity in 97 and 206 aa overlap at the N-terminus and C-terminus, 
respectively, with a predicted ORF (ORF26ng) from K gonorrhoeae: 

orf 2 6 . pep MQLIDYSHSFFSWPPFLALALAVITRRVLLSLGIGILXXVAFLVGGNPVDGLTHLKDMV 60 

M M M M M M M M I M M M M M M M I M I M I I I I I I I I I I I I I I I I I I I I I 
orf26ng MQLIDYSHSFFSVVPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 60 

orf 2 6. pep VGLAWSDXDWSLGKPKILVFXILLGIFTSLLTYSGSN 97 

M M I : I M M I M M M I M M M M M M M M 
orf26ng VGLAWADGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRCGAKMLTAC 120 

// 



orf 26 .pep TSLVFGGTCGVFAWLCTLGTIKTADYPKA 326 

Mil MM I M: M Ml I: II Mill II II 

orf26ng ASTVSAMIYTGAQASETFSILGAFENTDVNTSLVFGGTCGVLiAWLCTFGTIKTADYPKA 326 

orf 26. pep VWQGAKSMFGAIAILILAWLISTWGEMHTGDYLSTLVAGNIHPGFLPVILFLLASVMAF 386 

M M M M M M I M M M M M M I M M M M I M M M II M I M M II I I M I I M 

orf26ng VWQGAKSMFGAIAILILAWLISTWGEMHTGDYLSTLVAGNIHPGFLPVILFLLASVMAF 386 

orf 26. pep ATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSAVMAGAVCGDHCSPISDTTILSSTGAR 446 

I I I I I I I I I I I I I I I II I I I I I I I M I I I I I I I I I I I I I I M I I I I I I I M M I I I I I I I 

orf26ng ATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSAVMAGAVCGDHCSPISDTTILSSTGAR 44 6 

orf 2 6 . pep CNHIDHVTSQLPYALTVAAAAASGYLALGLTKSALLGFGTTGIVLAVLIFLLKDKK 502 

I M II I I II ! I M I I I I I I I I M I I I I I I II I I I I II I I II I I M I I I I I M I I I I 

orf26ng CNHIDHVTSQLPYALTVAAAAASGYLALGLTKSALLGFGTTGIVLAVLIFLLKDKKRADV 506 



The complete length ORF26ng nucleotide sequence <SEQ ID 695> is: 



1 ATGCAGCTGA TTGACTATTC ACATTCATTT TTCTCGGTTG TGCCACCCTT 

51 TTTGGCACTG GCACTTGCCG TCATTACCCG CCGCGTACTG CTGTCTTTAG 

101 GCATCGGTAT TTTGGTCGGC GTTGCCTTTT TGGTCGGCGG CAACCCCGTC 

151 GACGGTCTGA CACACCTGAA AGACATGGTC GTCGGCTTGG CTTGGGCAGA 

201 CGGCGATTGG TCGCTGGGCA AACCAAAAAT CTTGGTTTTC CTGATACTTT 

251 TGGGCATTTT CACTTCACTG CTGACCTACT CCGGCAGCAA TCAGGCGTTT 

301 GCCGACTGGG CAAAACGGCA CATTAAAAAC CGGTGCGGCG CGAAAATGCT 

351 GACCGCCTGC CTCGTGTTCG TAACCTTTAT CGACGACTAT TTCCACAGCC 

401 TCGCCGTCGG TGCGATTGCC CGCCCCGTTA CCGACAAGTT TAAAGTTTCC 

451 CGCGCCAAAC TCGCCTACAT CCTCGACTCC ACTGCCTCGC CCATGTGCGT 

501 GCTGATGCCC GTTTCAAGCT GGGGCGCGTC GATTATCGCC ACGCTTGCCG 

551 GATTGCTCGT TACCTACAAA ATTACCGAAT ACACGCCGAT GGGGACGTTT 

601 GTCGCCATGA GCCTGATGAA CTATTACGCG CTGTTTGCCC TGATTATGGT 

651 ATTCGTCGTC GCATGGTTCT CCTTCGACAT CGGCTCGAtg gCGCGTTTCG 

701 AACAGGCTGC GTTGAACGAA gcccaggacg aaaccgccgc tTCAGACgCT 

751 ACCAAAGGTC GTGTTTACGC ATTGATTATT CCCGTTTTGG CCTTAATCGC 

801 CTCAACGGTT TCCGCCATGA TCTACACCGG CGCGCAGGCA AGCGAAACCT 

851 TCAGCATTTT GGGGGCATTT GAAAATACCG ACGTAAACAC TTCGCTGGTA 

901 TTCGGCGGCA CTTGCGGCGT GCTTGCCGTC GTCCTCTGCA CGTTCGGCAC 

951 GATTAAAACC GCCGATTATC CCAAAGCCGT GTGGCAGGGT GCGAAATCCA 

1001 TGTTCGGCGC AATCGCCATT TTAATCCTCG CCTGGCTCAT CAGTACGGTT 

1051 GTCGGCGAAA TGCACACGGG CGACTACCTC TCCACGCTGG TTGCGGGCAA 

1101 CATCCATCCC GGCTTCCTGC CCGTCATCCT CTTCCTGCTC GCCAGCGTGA 

1151 TGGCGTTTGC CACAGGCACA AGCTGGGGGA CGTTCGGCAT TATGCTGCCG 

1201 ATTGCCGCCG CCATGGCGGT CAAAGTCGAA CCCGCGCTGA TTAtcccGTG 

1251 TATGTCCGCA GTAATGGCGG GGGCGGTATG CGGCGACCAC TGTTCGCCCA 

1301 TCTCCGACAC GACCATCCTG TCGTCCACCG GCGCGCGCTG CAACCACATC 

1351 GACCACGTTA CCTCGCAACT GCCTTATGCC CTGACGGTTG CCGCCGCCGC 

1401 CGCATCGGGC TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGCT 

1451 TTGGCACGAC CGGTATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAACGCG CCGACGTTTG A 

This encodes a protein having amino acid sequence <SEQ ID 696>: 
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1 MQLIDYSHSF FSWPPFLAL A LAVITRR VL LSLGIGILVG VAFLV GGNPV 

51 DGLTHLKDMV VGLAWADGDW SLGKP KILVF LILLGIFTSL LTY SGSNQAF 

101 ADWAKRHIKN R CGAKMLTAC LVFVTFID DY FHSLAVGAIA RPVTDKFKVS 

151 RAKLAYILDS TASPMCVLMP VSSWGASIIA TLAGLLV TYK ITEYTPMGTF 

201 VAMSLMNYYA LFALIMVFW AWFSFDI GSM ARFEQAALNE AQDETAASDA 

251 TKGRVY ALII PVLALIASTV SAMI YTGAQA SETFSILGAF ENTDVNTSLV 

301 FGGTCGVLAV VLCTF GTIKT ADYPKAVWQG AKSM FGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLPVILFLL ASVMAFA TGT SW GTFGIMLP 

401 IAAAMAVKV E P ALIIPCMSA VMAGAVCG DH CSPISDTTIL SSTGARCNHI 

451 DHVTSQLPY A LTVAAAAASG YLALGL TKSA LLGFGTTGIV LAVLIFL LKD 

501 KKRADV* 

ORF26ng and ORF26-1 show 98.4% identity in 505 aa overlap: 

10 20 30 40 50 60 

orf26-l.pep MQLIDYSHSFFSVVPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 
I I I I I I I I I i I I I I I I I I I 1 I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf26ng MQLIDYSHSFFSWPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 26-1 . pep VG LAW S DG DW S LGK PK I L V FL I LLG I FT S LLT YS G SNQAFADWAKRH I KNRRG AKMLTAC 
I I I I I : I I I I I I I I I I II I I I I I I I I I t I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf2 6ng VG LAWADG DW S LGK PK I LVFL I LLG I FT SLLTYSG SNQAFADWAKRH I KNRCG AKMLTAC 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 26-1. pep LVFVTFIDDYFHSLAVGAIARPVTDKFKVSRTKLAYILDSTAAPMCVLMPVSSWGASIIA 

I I I I II I I I I I I I I I I I I I I I I I I II I II I I : I I I I I I I I I I : I I ! I II I I I I II I I I I I 
orf26ng LVFVTFIDDYFHSLAVGAIARPVTDKFKVSRAKLAYILDSTASPMCVLMPVSSWGASIIA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 26-1. pep TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFWAWFSFDIGSMARFEQAALNE 
I I I I I I I I I I I I I II I I I I I I I I II I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 
orf26ng TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFWAWFSFDIGSMARFEQAALNE 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 26-1 . pep AHDETAVSDATKGRVYALIIPVLALIASTVSAMIYTGAQASETFSILGAFENTDVNTSLV 
I : I I I I : I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf26ng AQDETAASDATKGRVYALIIPVLALIASTVSAMIYTGAQASETFSILGAFENTDVNTSLV 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 26-1 . pep FGGTCGVLAWLCTLGTIKTADYPKAVWQGAKSMFGAIAILILAWLISTWGEMHTGDYL 
llllllllllll I I : I I 1 I I I I M I i I ! MUM! MINIMI III I I I I I I I I I I I I I 
orf26ng FGGTCGVLAVVLCTFGTIKTADYPKAVWQGAKSMFGAIAILILAWLISTWGEMHTGDYL 

• 310 320 330 340 350 360 

370 380 390 400 410 420 

orf 26-1 . pep STLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSA 
I I II I II I I II I I II I I I I I 1 I I I I I I I I I I II I 1 I I I I I I I I I I I I I I I I I I I I I I I I I 
orf26ng STLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSA 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 2 6-1. pep VMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQLPYALTVAAAAASGYLALGLTKSA 
I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I II II 
orf26ng VMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQLPYALTVAAAAASGYLALGLTKSA 

430 440 450 460 470 480 

490 500 
or f 2 6- 1 . pep LLGFGTTGIVLAVLI FLLKDKKRANAX 

I I I I I I I I II I I I I I I I I I M I I I : : 
orf26ng LLGFGTTGIVLAVLI FLLKDKKRADVX 

490 500 

In addition, ORF26 ng shows significant homology to a hypothetical H.influenzae protein: 
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sp|P44263|YF86_HAEIN HYPOTHETICAL PROTEIN HI1586 >gi 1 1074850 | pir I IC64037 
hypothetical 

protein HI1586 - Haemophilus influenzae (strain Rd KW20) >gi 11574427 (032832) H. 
influenzae predicted coding region HI1586 [Haemophilus influenzae] Length - 519 
5 Score = 538 bits (1370), Expect = e-152 

Identities = 280/507 (55%), Positives - 346/507 (68%), Gaps = 7/507 (1%) 

Query: 1 MQLIDYSHSFFSWPPFLALALAVITRRXXXXXXXXXXXXXAFLVGGNPVDGLTHLKDMV 60 
M+LID+S S +S+VP LA+ LA+ TRR L +L V 

10 Sbjct: 14 MELIDFSSSVWSIVPALLAIILAIATRRVLVSLSAGIIIGSLMLSDWQIGSAFNYLVKNV 73 

Query: 61 VGLAWADGDW S LGK PK I LV FL I LLG I FT S LLT Y S G SNQAFADWAKRH I KNRCGAKMLTAC 120 

V L +ADG+ + I++FL+LLG+ T+LLT SGSN+AFA+WA+ IK R GAK+L A 

Sbjct: 74 VSLVYADGEIN-SNMNIVLFLLLLGVLTALLTVSGSNRAFAEWAQSRIKGRRGAKLLAAS 132 

15 

Query: 121 LVFVTFIDDYFHSLAVGAIARPVTDKFKVSRAKLAYILDSTASPMCVLMPVSSWGASIIA 180 

LVFVTFIDDYFHSLAVGAIARPVTD+FKVSRAKLAYILDSTA+PMCV+MPVSSWGA II 
Sbjct: 133 LVFVTFIDDYFHSLAVGAIARPVTDRFKVSRAKLAYILDSTAAPMCVMMPVSSWGAYIIT 192 

20 Query: 181 TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFWAWFSFDIGSMARFEQAALNE 240 

+ GLL TY ITEYTP+G FVAMS MN+YA+F++IMVF VA+FSFDI SM R E+ AL 
Sbjct: 193 LIGGLLATYSITEYTPIGAFVAMSSMNFYAI FS I IMVFFVAYFSFDIASMVRHEKLALKN 252 

Query: 241 AQDETAAS DATKGRVYALI I PVLALI ASTVSAMI YTGAQA SETFSILGAFENTDVN 296 

25 +D+ TKG+V LI+P+L LI +TVS MIYTGA+A + FS+LG FENT V 

Sbjct: 253 TEDQLEEETGTKGQVRNLILPILVLIIATVSMMIYTGAEALAADGKVFSVLGTFENTWG 312 

Query: 297 TSLVFGGTCGVL — AWLCTFGTIKTADYPKAVWQGAKSMFGXXXXXXXXXXXSTWGEM 354 
TSLV GG C ++ +++ + +Y ++ G KSM G + +VG+M 

30 Sbjct: 313 TSLWGGFCSIIISTLLIILDRQVSVPEYVRSWIVGIKSMSGAIAILFFAWTINKIVGDM 372 

Query: 355 HTGDYLSTLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVEPALI 414 

TG YLS+LV+GNI FLPVILF+L + MAF+TGTSWGTFGIMLPIAAAMA P L+ 
Sbjct: 373 QTGKYLSSLVSGNIPMQFLPVILFVLGAAMAFSTGTSWGTFGIMLPIAAAMAANAAPELL 432 

Query: 415 IPCMSAVMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQXXXXXXXXXXXXXXXXXX 474 

+PC+SAVMAGAVCGDHCSP+SDTTILSSTGA+CNHIDHVT+Q 
Sbjct: 433 LPCLSAVMAGAVCGDHCSPVSDTTILSSTGAKCNHIDHVTTQLPYAATVATATSIGYIW 4 92 



35 



40 Query: 475 XXXKSALLGFGTTGIVLAVLIFLLKDK 501 

S L GF T + L V+IF +K + 
Sbjct: 4 93 G FT Y S G LAG FAAT AV S L I V 1 1 F A VKKR 519 

Based on this analysis, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
45 and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 83 / 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 697>: 



1 ..AAGCAATGGT ATGCCGACGN . AGTATCAAG ACGGAAATGG TTATGGTCAA 

51 CGATGAGCCT GCCAAAATTC TGACTTGGGA TGAAAGCGGC CGATTACTCT 

50 101 CGGAACTGTC TATCCGCCAC CATCAACGCA ACGGGGTGGT TTTGGAGTGG 

151 TATGAAGATG GTTCTAAAAA GAGCGAAGT. GTTTATCAGG ATGACAAGTT 

201 GGTCAGGAAA ACCCAGTGGG ATAAGGATGG TTATTTAATC GAACCCTGA 

This corresponds to the amino acid sequence <SEQ ID 698; ORF27>: 



1 ..KQWYADXSIK TEMVMVNDEP AKILTWDESG RLLSELSIRH HQRNGWLEW 
55 51 YEDGSKKSEX VYQDDKLVRK TQWDKDGYLI EP* 

Further work revealed the complete nucleotide sequence <SEQ ID 699>: 

1 ATGAAAAAAT TATCTCGGAT TGTATTTTCA ACTGTCCTGT TGGGTTTTTC 

51 GGCCGCTTTG CCGGCGCAGA CCTATTCTGT TTATTTTAAT CAGAACGGAA 

101 AGCTGACGGC GACGATGTCT TCTGCCGCTT ATATCAGGCA ATATAGTGTG 

60 151 GTGGCGGGTA TTGCGCACGC GCAGGATTTT TATTATCCGT CGATGAAGAA 
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201 ATATTCTGAA CCTTATATCG TTGCTTCAAC GCAAATCAAA TCTTTTGTGC 

251 CTACCCTGCA AAACGGTATG TTGATTTTGT GGCATTTTAA TGGTCAGAAA 

301 AAAATGGCGG GGGGCTTCAG CAAGGGTAAG CCGGACGGGG AGTGGGTCAA 

351 CTGGTATCCG AACGGTAAAA AATCTGCCGT TATGCCTTAT AAAAATGGCT 

401 TGAGTGAGGG TACGGGATAC CGCTATTACC GTAACGGCGG CAAGGAAAGC 

451 GAAATCCAGT TTAAGCAAAA TAAGGCAAAC GGCGTATGGA AGCAATGGTA 

501 TGCCGACGGC AGTATCAAGA CGGAAATGGT TATGGTCAAC GATGAGCCTG 

551 CCAAAATTCT GACTTGGGAT GAAAGCGGCC GATTACTCTC GGAACTGTCT 

601 ATCCGCCACC ATCAACGCAA CGGGGTGGTT TTGGAGTGGT ATGAAGATGG 

651 TTCTAAAAAG AGCGAAGCTG TTTATCAGGA TGACAAGTTG GTCAGGAAAA 

701 CCCAGTGGGA TAAGGATGGT TATTTAATCG AACCCTGA 

This corresponds to the amino acid sequence <SEQ ID 700; ORF27-l>: 



1 MKKLSRIVFS TVLLGFSAAL PAQTYSVYFN QNGKLTATMS SAAYIRQYSV 

51 VAGIAHAQDF YYPSMKKYSE PYIVASTQIK SFVPTLQNGM LILWHFNGQK 

101 KMAGGFSKGK PDGEWVNWYP NGKKSAVMPY KNGLSEGTGY RYYRNGGKES 

151 EIQFKQNKAN GVWKQWYADG SIKTEMVMVN DEPAKILTWD ESGRLLSELS 

201 IRHHQRNGW LEWYEDGSKK SEAVYQDDKL VRKTQWDKDG YLIEP* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF27 shows 91.5% identity over a 82aa overlap with an ORF (ORF27a) from strain A of N. 
meningitidis: 



10 20 30 

or f 27 . pep KQWYADXSIKTEMVMVNDEPAKILTWDESG 

I I I I I I : I I I II I I I I I I I I I I I I I I I I I 
orf27a LSEGTGXRYYRNGGKESEIQFKQNKANGVWKQWYADGNIKTEMVMVNDEPAKILTWDESG 
140 150 160 170 180 190 



40 50 60 70 80 

orf 27 . pep RLLSELSIRHHQRNGWLEWYEDGSKKSEXVYQDDKLVRKTQWDKDGYLIEPX 
I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 27a RLLSELSIHHHXRNGWLEWYEDGSKKXEAVYQDDKLVRKTQWDXDGYLIEPX 
200 210 220 230 240 

The complete length ORF27a nucleotide sequence <SEQ ID 701> is: 



1 ATGAAAAAAT TATCTCGGAT TGTATTTTCA ACTGTCCTGT TGGGTTTTTC 

51 GGCCGCTTTG CCGGCGCAGA NCTATTCTGT TTATTTTAAT CAGAACGGGA 

101 AACTGACGGC GACGNTGTCT TCTGCCGCNT ATATCAGGCA ATATAGTGTG 

151 GCGGAGGGTA TTGCGCACGC GCAGGANTTT TANTATCCGT CGATGAAGAA 

201 ATATTCCGAA CCTTATATCG TTGCTTCAAC GCAAATCAAA TCTTTTGTGC 

251 CTACCCTGCA AAACGGTATG TTGATTTTGT GGCATTTTAA NGGTCAGAAA 

301 AAAATGGCNG GGGGCTTCAG CAAGGGTAAG CCGGACGGGG AGTGGGTCAA 

351 CTGGTATCCG AACGGTAAAA AATCTGCCGT TATGCCTTAT AAAAATGGTT 

401 TGAGTGAAGG TACGGGGTNN CGCTATTACC GTAACGGCGG CAAGGAAAGC 

451 GAAATCCAGT TTAAACAGAA TAAGGCAAAC GGCGTATGGA AGCAATGGTA 

501 TGCCGACGGC AATATCAAAA CGGAAATGGT TATGGTCAAT GATGAGCCTG 

551 CCAAAATTCT GACATGGGAT GAAAGCGGTC GATTACTCTC GGAACTGTCT 

601 AT C CAT CATC ATNAACGTAA TGGAGTAGTC TTAGAGTGGT ATGAAGATGG 

651 TTCTAAAAAG ANTGAAGCTG TTTATCAGGA TGATAAGTTG GTCAGGAAAA 

701 CCCAGTGGGA TAANGATGGT TATTTAATCG AACCCTGA 

This encodes a protein having amino acid sequence <SEQ ID 702>: 



1 MKKLSRIVFS TVLLGFSAAL PAQXYSVYFN QNGKLTATXS SAAYIRQYSV 

51 AEGIAHA QXF XYPSMKKYSE PYIVASTQIK SFVPTLQNGM LILWHFXGQK 

101 KMAGGFSKGK PDGEWVNWYP NGKKSAVMPY KNGLSEGTGX RYYRNGGKES 

151 EIQFKQNKAN GVWKQWYADG NIKTEMVMVN DEPAKILTWD ESGRLLSELS 

201 IHHHXRNGW LEWYEDGSKK XEAVYQDDKL VRKTQWDXDG YLIEP* 

ORF27a and ORF27-1 show 94.7% identity in 245 aa overlap: 



10 20 30 40 50 60 

orf 27a. pep MKKLSRIVFSTVLLGFSAALPAQXYSVYFNQNGKLTATXSSAAYIRQYSVAEGIAHAQXF 
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I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I M I I : I I I I I I I 
orf 27-1 MKKLSRIVFSTVLLGFSAALPAQTYSVYFNQNGKLTATMSSAAYIRQYSWAGIAHAQDF 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 27a. pep XYPSMKKYSEPYIVASTQIKSFVPTLQNGMLILWHFXGQKKMAGGFSKGKPDGEWVNWYP 

I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 27-1 YYPSMKKYSEPYIVASTQIKSFVPTLQNGMLILWHFNGQKKMAGGFSKGKPDGEWVNWYP 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 27a . pep NGKKSAVMPYKNGLSEGTGXRYYRNGGKESEIQFKQNKANGVWKQWYADGNIKTEMVMVN 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I 
orf 27-1 NGKKSAVMPYKNGLSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGSIKTEMVMVN 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 27a . pep DEPAKILTWDESGRLLSELSIHHHXRNGVVLEWYEDGSKKXEAVYQDDKLVRKTQWDXDG 
I I I II I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
orf 27-1 DEPAKILTWDESGRLLSELSIRHHQRNGWLEWYEDGSKKSEAVYQDDKLVRKTQWDKDG 

190 200 210 220 230 240 



, orf 27a. pep YLIEPX 
I I I I I I 

orf27-l YLIEPX 



Homology with a predicted ORF from N. gonorrhoeae 

ORF27 shows 96.3% identity over 82 aa overlap with a predicted ORF (ORF27ng) from 
N. gonorrhoeae: 

orf 27 .pep KQWYADXSIKTEMVMVNDEPAKILTWDESG 30 

I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
orf27ng LSEGTG YRYYRNGGKESE I QFKQNKANGVWKQWYADGS I KTEMVMVNDE PAKI LTWDE SG 193 



orf 27 .pep RLLSELSIRHHQRNGWLEWYEDGSKKSEXVYQDDKLVRKTQWDKDGYLIEP 82 

I I I I I I I II I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II 
0rf27ng RLLSELSIRHHKRNGWLEWYEDGSKKSEAVYQDDKLVRKTQWDKDGYLIEP 245 

The complete length ORF27ng nucleotide sequence <SEQ ID 703> is: 



1 ATGAAGAAAT TATCTCGGAT TGTATTTTCA ATCGTACTGT TGGGTTTTTC 

51 GGCCGCTTTG CCGGCGCAGA CCTATTCTGT TTATTTTAAT CAGAACGGGA 

101 AACTGACGGC GACGATGTCT TCTGCCGCTT ATATCAGGCA ATATAGTGTG 

151 GCGGCGGGTA TCGCACACGC GCAGGATTTT TATTATCCGT CGATGAAGAA 

201 ATATTCCGAA CCTTATATCG TTGCTTCAAC GCAAATCAAA TCTTTTGTGC 

251 CTACCCTGCA AAACGGTATG TTGATTTTGT GGCATTTTAA TGGTCAGAAA 

301 AAAATGGCGG GGGGCTTCAG CAAGGGTAAG CCGGACGGGG AATGGGTCAA 

351 CTGGTATCCG AACGGTAAAA AATCTGCGGT TATGCCTTAT AAAAATGGCT 

401 TGAGTGAGGG TACGGGATAC CGTTATTACC GTAACGGCGG CAAGGAAAGC 

451 GAAATCCAGT TTAAGCAAAA TAAGGCGAAC GGCGTATGGA AGCAATGGTA 

501 TGCCGATGGA AGTATCAAGA CGGAAATGGT TATGGTCAAC GATGAGCCTG 

551 CCAAAATTCT GACTTGGGAT GAAAGCGGCC GATTACTTTC GGAACTGTCT 

601 ATCCGCCACC ATAAACGCAA CGGGGTGGTT TTGGAGTGGT ATGAAGATGG 

651 TTCTAAAAAG AGCGAGGCTG TTTATCAGGA TGACAAGTTG GTCAGGAAAA 

701 CCCAATGGGA TAAGGATGGT TATTTAATCG AACCCTGA 

This encodes a protein having amino acid sequence <SEQ ID 704>: 



1 MKKLSRIVFS IVLLGFSAAL PA QTYSVYFN QNGKLTATMS SAAYIRQYSV 

51 AAGIAHAQDF YYPSMKKYSE PYIVASTQIK SFVPTLQNGM LILWHFNGQK 

101 KMAGGFSKGK PDGEWVNWYP NGKKSAVMPY KNGLSEGTGY RYYRNGGKES 

151 EIQFKQNKAN GVWKQWYADG SIKTEMVMVN DEPAKILTWD ESGRLLSELS 

201 IRHHKRNGW LEWYEDGSKK SEAVYQDDKL VRKTQWDKDG YLIEP* 

ORF27ng and ORF27-1 show 98.8% identity in 245 aa overlap: 

10 20 30 40 50 60 

orf 27-1 . pep MKKLSRIVFSTVLLGFSAAL PAQT YSVYFNQNGKLT ATMS SAAYIRQYS WAG IAHAQDF 
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10 



! I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I 
orf27ng MKKLSRIVFSIVLLGFSAALPAQTYSVYFNQNGKLTATMSSAAYIRQYSVAAGIAHAQDF 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 27-1 . pep YYPSMKKYSEPYIVASTQIKSFVPTLQNGMLILWHFNGQKKMAGGFSKGKPDGEWVNWYP 
I I I I I I II i I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf27ng YYPSMKKYSEPYIVASTQIKSFVPTLQNGMLILWHFNGQKKMAGGFSKGKPDGEWVNWYP 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 27-1 . pep NGKKSAVMPYKNGLSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGSIKTEMVMVN 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I I II I I I I I I I I I I 
o r f 2 7 ng NGKKS AVMP YKNGLS EGTG YRYYRNGGKE SE I QFKQNKANGVWKQWYADG S IKTEMVMVN 

15 130 140 150 160 170 180 

190 200 210 220 230 240 

orf 27-1 . pep DEPAKILTWDESGRLLSELSIRHHQRNGWLEWYEDGSKKSEAVYQDDKLVRKTQWDKDG 
I I I I I I I I I I I I 11 I I I I I I I I I I : 11 I I I I i I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I 
20 orf27ng DEPAKILTWDESGRLLSELSIRHHKRNGWLEWYEDGSKKSEAVYQDDKLVRKTQWDKDG 

190 200 210 220 230 240 

orf 27-1. pep YLIEPX 
25 I I I I I I 

orf27ng YLIEPX 

Based on this analysis, including the putative leader sequence in the gonococcal protein, it was 
predicted that the proteins from N. meningitidis and Kgonorrhoeae, and their epitopes, could be 
30 useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF27-1 (24.5kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
17A shows the results of affinity purification of the GST-fusion protein, and Figure 17B shows the 
results of expression of the His-fiision in E.coli. Purified GST-fusion protein was used to immunise 
35 mice, whose sera were used for ELISA, which gave a positive result, confirming that ORF27-1 is 
a surface-exposed protein and a useful immunogen. 

Example 84 

The following partial DNA sequence was identified in K meningitidis <SEQ ID 705>: 

1 ATGAAATTTA CCAAGCACCC CGTCTGGGCA ATGGCGTTCC GCCCATTTTA 

40 51 TTCGCTGGCG GCTCTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACkAG CTGTCCGGTT TCTATTGGCA CGCGCATGAg 

151 ATGATTTGGG GTTATGCCGG ACTGGTCGTC ATCGCCTTCC TGCTGACCGC 

201 CGTCGCCACT TGGACGGGGC AGCCGCCCAC GCGGGGCGGC GTaTCTGGTC 

251 GGCTTGACTA TCTTTTGGCT GGCTGCGCGG ATTGCCGCCT TTATCCCGGG 

45 301 TTGGGGTGCG TCGGCAAGCG GCATACTCGG TACGCTGTTT TTCTGGTACG 

351 GCGCGGTGTG CATGGCTTTG CCCGTTATCC GTTCGCAGAA TCAACGCAAC 

401 TATGTTgCCG TGTTCGCGCT GTTCGTCTTG GGCGGCACGC ATGCGGCGTT 

451 CCACGTCCAG CTGCACAACG GCAACCTAGG CGGACTCTTG AGCGGATTGC 

501 AGTCGGGCTT GGTGATG 

50 This corresponds to the amino acid sequence <SEQ ID 706; ORF47>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHX LSGFYWHAHE 
51 MIWGYAGLW IAFLLTAVAT WTGQPPTRGG VLVGLTIFWL AARIAAFIPG 
. 101 WGASASGILG TLFFWYGAVC MALPVIRSQN QRNYVAVFAL FVLGGTHAAF 
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151 HVQLHNGNLG GLLSGLQSGL VM 

Further work revealed the complete nucleotide sequence <SEQ ID 707>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 



ATGAAATTTA 
TTCGCTGGCG 
GCTACACGGG 
ATGATTTGGG 
CGTCGCCACT 
GCTTGACTAT 
TGGGGTGCGT 
CGCGGTGTGC 
ATGTTGCCGT 
CACGTCCAGC 
GTCGGGCTTG 
TTATTTCGTT 
CCGAAATGGG 
GCTGATGGCG 
CGGCAGGTGT 
GTGTTGAAAG 
CGGATTGGGG 
TCAATCTGGG 
TTGGGCATGA 
TCCGCCGCCC 
CCGCCGTCCG 
AGCATCCGCA 
GTGGAAGTAT 
GTTGA 



CCAAGCACCC 
GCTCTGTACG 
AACGCACGAG 
GTTATGCCGG 
TGGACGGGGC 
CTTTTGGCTG 
CGGCAAGCGG 
ATGGCTTTGC 
GTTCGCGCTG 
TGCACAACGG 
GTGATGGTGT 
TTTTACGTCC 
TGGCGCAGGC 
CACGGTGTGT 
GATTTTTACC 
AGCCGATGCT 
CTGATTGCGG 
TGTGCATCTG 
TGGCGCGTAC 
AAAGCCGTTC 
TATGGTTGCC 
CCTCTTCGGT 
ATTCCTTGGC 



CGTCTGGGCA 
GCGCATTGTC 
CTGTCCGGTT 
ACTGGTCGTC 
AGCCGCCCAC 
GCTGCGCGGA 
CATACTCGGT 
CCGTTATCCG 
TTCGTCTTGG 
CAACCTAGGC 
CGGGTTTTAT 
AAACGCTTGA 
TTCGCTGTGG 
TGGCTTGGCT 
GTGCAGGTGT 
GTGGATTCTG 
TCGGCGCGTC 
ATCGGGGTCG 
CGCGCTTGGT 
CCGTTGCGTT 
GTATTTTCTT 
TTTGTTTGCA 
TGATTCGTCC 



ATGGCGTTCC 
CGTATTGCTG 
TCTATTGGCA 
ATCGCCTTCC 
GCGGGGCGGC 
TTGCCGCCTT 
ACGCTGTTTT 
TTCGCAGAAT 
GCGGCACGCA 
GGACTCTTGA 
CGGTCTGATT 
ATGTGCCGCA 
CTGCCCATGC 
GTCTGCCGTT 
ACCGCTGGTG 
TTTGCCGGCT 
TTATTTCAAA 
GCGGTATCGG 
CATACGGGCA 
TTGGCTGATG 
CCGGCACTGC 
CTCGCGCTTT 
GCGTTCGGAC 



GCCCATTTTA 
TGGGGTTTCG 
CGCGCATGAG 
TGCTGACCGC 
GTTCTGGTCG 
TATCCCGGGT 
TCTGGTACGG 
CAACGCAACT 
TGCGGCGTTC 
GCGGATTGCA 
GGTACGCGGA 
GATTCCCAGT 
TGACTGCCAT 
TTTGCCTTTG 
GTATAAACCC 
ATCTGTTTAC 
CCCGCTTTCC 
CGTGCTGACT 
ATCCGATTTA 
ATGGCGGCAA 
CTACACGCAC 
TGGTGTATGC 
GGCAGGCCCG 



This corresponds to the amino acid sequence <SEQ ID 708; ORF47-l>: 



1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 M IWGYAGLW IAFLLTAV AT WTGQPPTRGG VLVGLTIFWL AARIAAFI PG 

101 WGASAS GILG TLFFWYGAVC MAL PVIRSQN QRN YVAVFAL FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVSGFIGLI GTRII SFFTS KRLNVPQIPS 

201 PKW VAQASLW LPMLTAMLMA HGVLAW LSAV FAFAAGVIFT VQV YRWWYKP 

251 VLKEPMLW IL FAGYLFTGLG LIAVGA SYFK P AFLNLGVHL IGVGGIGVL T 

301 LGMMARTALG HTGNPIYPPP KAVP VAFWLM MAATAVRMVA V FSSGTAYTH 

351 SIRTSSVLFA LALLVYA WKY IPWLIRPRSD GRPG* 

Computer analysis of this amino acid sequence predicts a leader peptide and also gave the 
following results: 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF47 shows 99.4% identity over a 172aa overlap with an ORF (ORF47a) from strain A of N. 



meningitidis: 

10 20 30 40 50 60 

or f 4 7. pep MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHXLSGFYWHAHEM IWGYAGLW 
[ I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 1 I I I I I I I I I I I I 
orf47a MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHE MIWGYAGLW 

10 20 30 40 50 60 



70 80 90 100 110 120 

or f 4 7 . pep IAFLLTAVA TWTGQPPTRGG VLVGLT I FWLAARIAAFI PGWGASAS GI LGTLFFWYGAVC 
I | | | | | | | I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II 1 I I I I I I I I I I I II I I I I I I 
o r f 4 7 a IAFLLTAVAT WTGQP PT RGG VLVGLT I FWLAARIAAFI PGWGASAS GI LGTLFFWYGAVC 

70 80 90 100 110 120 



130 140 150 160 170 

orf 47 . pep MALPVIRSQNQRN YVAVFALFVLGGTHAAF HVQLHNGNLGGLLSGLQS GLVM 

TTTi 1 1 1 1 1 1 1 1 1 1 i 1 1 i i M 1 1 1 1 M M 1 1 M 1 1 1 1 1 1 1 1 it 1 1 1 1 1 i i M 

orf 47a MALPVIRSQNQRN YVAVFALFVLGGTHAAF HVQLHNGNLGGLLSGLQS GLVMVSGFIGLI 
130 140 150 160 170 180 



orf47a 



GTRIIS FFTSKRLNVPQI PS PKWVAQAS LWL PMLTAMLMAHGVMPWLS AAFAFAAGVI FT 
190 200 210 220 230 240 



WO 99/24578 



-405- 



PCT/IB98/01665 



The complete length ORF47a nucleotide sequence <SEQ ID 709> is: 

1 ATGAAATTTA CCAAGCACCC CGTTTGGGCA ATGGCGTTCC GCCCGTTTTA 

51 TTCACTGGCG GCTCTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACGAG CTGTCCGGTT TCTATTGGCA CGCGCATGAG 

151 ATGATTTGGG GTTATGCCGG ACTGGTCGTC ATCGCCTTCC TGCTGACCGC 

201 CGTCGCCACT TGGACGGGGC AGCCGCCCAC GCGGGGCGGC GTTCTGGTCG 

251 GCTTGACTAT CTTTTGGCTG GCTGCGCGGA TTGCCGCCTT TATCCCGGGT 

301 TGGGGTGCGT CGGCAAGCGG CATACTCGGT ACGCTGTTTT TCTGGTACGG 

351 CGCGGTGTGC ATGGCTTTGC CCGTTATCCG TTCGCAGAAT CAACGCAATT 

401 ATGTTGCCGT GTTCGCGCTG TTCGTCTTGG GCGGTACGCA CGCGGCGTTC 

451 CACGTCCAGC TGCACAACGG CAACCTAGGC GGACTCTTGA GCGGATTGCA 

501 GTCGGGCTTG GTGATGGTGT CGGGTTTTAT CGGTCTGATT GGTACGCGGA 

551 TTATTTCGTT TTTTACGTCC AAACGGTTGA ATGTGCCGCA GATTCCCAGT 

601 CCGAAATGGG TGGCGCAGGC TTCGCTGTGG CTGCCCATGC TGACCGCCAT 

651 GCTGATGGCG CACGGCGTGA TGCCTTGGCT GTCGGCGGCT TTCGCGTTTG 

701 CGGCAGGTGT GATTTTTACC GTGCAGGTGT ACCGCTGGTG GTATAAGCCT 

751 GTGTTGAAAG AGCCGATGCT GTGGATTCTG TTTGCCGGCT ATCTGTTTAC 

801 CGGATTGGGG CTGATTGCGG TCGGCGCGTC TTATTTCAAA CCCGCTTTCC 

851 TCAATCTGGG TGTGCATCTG ATCGGGGTCG GCGGTATCGG CGTGCTGACT 

901 TTGGGCATGA TGGCGCGTAC CGCGCTCGGT CATACGGGCA ATCCGATTTA 

951 TCCGCCGCCC AAAGCCGTTC CCGTTGCGTT TTGGCTGATG ATGGCGGCAA 

1001 CCGCCGTCCG TATGGTTGCC GTATTTTCTT CCGGCACTGC CTACACGCAC 

1051 AGCATACGCA CCTCTTCGGT TTTGTTTGCA CTCGCGCTTT TGGTGTATGC 

1101 GTGGAAGTAT ATTCCTTGGC TGATTCGTCC GCGTTCGGAC GGCAGGCCCG 

1151 GTTGA 

This encodes a protein having amino acid sequence <SEQ ID 710>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 M IWGYAGLW IAFLLTAV AT WTGQPPTRGG V LVGLTIFWL AARIAAFI PG 

101 WGASAS GILG TLFFWYGAVC MAL PVIRSQN QRN YVAVFAL FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVSGFIGLI GTRII SFFTS KRLNVPQIPS 

201 PK WVAQASLW LPMLTAMLMA HGVMPW LSAA FAFAAGVIFT VQV YRWWYKP 

251 VLKEPMLW IL FAGYLFTGLG LIAVG ASYFK P AFLNLGVHL IGVGGIGVL T 

301 LGMMARTALG HTGNPIYPPP KAVP VAFWLM MAATAVRMVA V FSSGTAYTH 

351 SIRTSSVLFA LALLVYA WKY IPWLIRPRSD GRPG* 

ORF47a and ORF47-1 show 99.2% identity in 384 aa overlap: 

10 20 30 40 50 60 

MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 

I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I t I I I 
MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 
10 20 30 40 50 60 

70 80 90 100 110 120 

IAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 
!l! tllllll I II II I I I IMIII I! I! II III I II II II I M I I I! I II I III III M! 
IAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 
70 80 90 100 110 120 

130 140 150 160 170 180 

MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 

I I I I I I 1 I I I I I I I I I I I I I I I I I I t I I I I I I I I I I If I I I I i I I I I I I I I I I I I I I I I I 
MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 
130 140 150 160 170 180 

190 200 210 220 230 240 

GTRIISFFTSKRLNVPQIPSPKWVAQASLWLPMLTAMLMAHGVMPWLSAAFAFAAGVIFT 
I I I M I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I |: I II I I I I I I 1 
GTRI I S FFTSKRLNVPQI PS PKWVAQAS LWLPMLT AMLMAHGVLAWLS AVFAFAAGVI FT 
190 200 210 220 230 240 

250 260 270 280 290 300 

VQVYRWWYKPVLKEPMLWILFAGYLFTGLGLIAVGASYFKPAFLNLGVHLIGVGGIGVLT 

I I I I I II M I I I M I I I I I M I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I 

VQVYRWWYKPVLKEPMLWILFAGYLFTGLGLIAVGASYFKPAFLNLGVHLIGVGGIGVLT 
250 260 270 280 290 300 



orf 47a . pep 
orf47-l 

orf47a.pep 
orf47-l 

orf 47a. pep 
orf47-l 

orf 47a. pep 
orf47-l 

orf 47a.pep 
orf47-l 



310 320 330 340 350 360 
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or f 47a . pep LGMMARTALGHTGNPIYPPPKAVPVAFWLMMAATAVRMVAVFSSGTAYTHSIRTSSVLFA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I 
orf47-l LGMMARTALGHTGNPIYPPPKAVPVAFWLMMAATAVRMVAVFSSGTAYTHSIRTSSVLFA 

310 320 330 340 350 360 

370 380 
orf 4 7a . pep LALLV YAWK Y I PWL I R PRS DGRPGX 

I I I i I I I I I I I I 1 1 I i I I I I I I I I I 

o r f 4 7 - 1 LALLVYAWK Y I PWL I RPRS DGRPGX 

370 380 

Homology with a predicted ORF from N. gonorrhoeae 

ORF47 shows 97.1% identity over 172 aa overlap with a predicted ORF (ORF47ng) from 
N. gonorrhoeae: 

ORF47 MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 60 

I | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
ORF4 7ng MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLVV 60 

ORF47 IAFLLTAVATWTGQPPTRGGVLVGLTI FWLAARIAAFI PGWGASASGILGTLFFWYGAVC 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I i I I : I I I I I I I I I I I I I I I I 
ORF47ng IAFLLTAVATWTGQPPTRGGVLVGLTAFWLAARIAAFIPGWGAAASGILGTLFFWYGAVC 120 

ORF47 MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVM 172 

I | I | I I I I I I: I I I I I I I I : I I I I I I I II I I I I I I I I I I I I I I I I I I I i I I I 
ORF47ng MALPVIRSQNRRNYVAVFAIFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVWGFIGLI 180 

The ORF47ng nucleotide sequence <SEQ ID 71 1> is predicted to encode a protein comprising 
amino acid sequence <SEQ ED 712>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 MIWGYAGLW IAFLLTAVA T WTGQPPTRGG VLVGLTAFWL AARIAAFI PG 

101 WGAAAS GILG TLFFWYGAVC MAL PVIRSQN RR NYVAVFAI FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVWGFIGLI GMKII SFFTS KRLKLPQIPS 

201 PKWVAHASLW LPMLNAILMA HRVMPW LSAA FPFAAGVIFT VQVY AGGITP 

251 IEETSCGSVA GICYRLGNSS G 

The predicted leader peptide and transmembrane domains are identical (except for an lie/ Ala 
substitution at residue 87 and an Leu/Ile substitution at position 140) to sequences in the 
meningococcal protein (see also Pseudomonas stutzeri or£396, accession number e246540): 

TM segments in ORF47ng 



INTEGRAL 


Likelihood 




-5. 


63 


Transmembrane 


52 - 


68 


INTEGRAL 


Likelihood 




-3. 


88 


Transmembrane 


169 - 


185 


INTEGRAL 


Likelihood 




-3. 


08 


Transmembrane 


82 - 


98 


INTEGRAL 


Likelihood 




-1. 


91 


Transmembrane 


134 - 


150 


INTEGRAL 


Likelihood 




-1. 


44 


Transmembrane 


107 - 


123 


INTEGRAL 


Likelihood 


cs 


-1. 


38 


Transmembrane 


227 - 


243 



Further work revealed the complete gonococcal DNA sequence <SEQ ID 713>: 

1 ATGAAATTTA CCAAACATCC CGTCTGGGCA ATGGCGTTCC GCCCGTTTTA 

51 TTCACTGGCG GCACTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACGAG CTGTCCGGTT TCTATTGGCA CGCGCATGAG 

151 ATGATTTGGG GTTATGCCGG TCTCGTCGTC ATCGCCTTCC TGCTGACCGC 

201 CGTCGCCACT TGGACGGGAC AGCCGCCCAC GAGGGGCGGC GTTCTGGTCG 

251 GCTTGACCGC CTTTTGGCTG GCTGCGCGGA TTGCCGCCTT TATCCCGGGT 

301 TGGGGTGCGG CGGCAAGCGG CATACTCGGT ACGCTGTTTT TCTGGTACGG 

351 CGCGGTGTGC ATGGCTTTGC CCGTTATCCG TtcgCAAAAC CGGCGCAACT 

401 ATGtcgCCGT ATTCGCAATA TTTGTGCTGG GCGGTACGCA TGCGgcgTTC 

451 CACGtccAgc tGCACAACGG CAACCTAGGC GGACTCTTGA GCGGATTGCA 

501 GTCGGGCCTG GTTATGGTGT CGGGCTTTAT CGGCCTGATT GGGATGAGGA 

551 TTATTTCGTT TTTTACGTCC AAACGGTTGA ACGTGCCGCA GATTCCCAGT 

601 CCGAAATGGG TGGCGCAGGC TTCGCTGTGG CTACCCATGC TGACCGCCAT 
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651 ACTGATGGCG CACGGCGTGA 

701 CGGCGGGCGT GATTTTTACC 

751 GTATTGAAAG AACCGATGCT 

801 CGGATTGGGG CTGATTGCGG 

851 TCAATCTGGG CGTACATCTG 

901 TTGGGCATGA TGGCGCGTAC 

951 TCCGCCGCCC AAAGCCGTTC 

1001 CCGCCGTCCG TATGGTTGCC 

1051 AGCATCCGCA CGTCTTCGGT 

1101 GTGGAAATAC ATTCCGTGGC 

1151 GTTGA 

This encodes a protein having amino acic 



TGCCTTGGCT GTCGGCGGCT TTCGCGTTTG 
GTACAGGTGT ACCGCTGGTG GTATAAACCC 
GTGGATTCTG TTTGCCGGCT ATCTGTTTAC 
TCGGCGCGTC TTATTTCAAA CCTGCCTTCC 
ATCGGGGTCG GCGGTATCGG CGTGCTGACT 
CGCGCTCGGT CATACGGGCA ATTCGATTTA 
CCGTTGCGTT TTGGCTGATG ATGGCGGCAA 
GTATTTTCTT CCGGCACTGC CTACACGCAC 
TTTGTTTGCA CTCGCGCTGC TGGTGTATGC 
TGATCCGTCC GCGTTCGGAC GGCAGGCCCG 



sequence <SEQ ID 714; ORF47ng-l>: 



1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 M IWGYAGLW IAFLLTAVA T WTGQPPTRGG V LVGLTAFWL AARIAAFI PG 

101 WGAAAS GILG TLFFWYGAVC MAL PVIRSQN RRN YVAVFAI FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVSGFIGLI GMRII SFFTS KRLNVPQIPS 

201 PKW VAQASLW LPMLTAILMA HGVMPW LSAA FAFAAGVIFT VQV YRWWYKP 

251 VLKEPMLW IL FAGYLFTGLG LIAVGA SYFK P AFLNLGVHL I GVGG I GVL T 

301 LGMMARTALG HTGNSIYPPP KAVP VAFWLM MAATAVRMVA V FSSGTAYTH 

351 SIRTSSVLFA LALLVYA WKY IPWLIRPRSD GRPG* 

ORF47ng-l and ORF47-1 show 97.4% identity in 384 aa overlap: 

10 20 30 40 50 60 

orf 47-1 . pep MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLVV 

* L M I M II I Mill II IN I I I I I I I I I I 1 I 1 I 1 1 I I I 1 1 I 1 I I i I t I I I 1 J I I I 

orf47ng-l MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 47-1 . pep IAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 
I I I I I I I I I I I I 1 I I I I I I I I II I I I I I I I I I I I I I I M I I I : I I I I I I I I I I I I I I 1 I 
orf47ng-l I AFLLTAVATWTGQ P PTRGGVLVGLT AFWLAAR I AAF I PGWGAAAS G I LGTLFFW YGAVC 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 47-1 . pep MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 
I | | | | | 1 | | | : | I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf47ng-l MALPVIRSQNRRNYVAVFAIFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 47-1 pep GTRIISFFTSKRLNVPQIPSPKWVAQASLWLPMLTAMLMAHGVLAWLSAVFAFAAGVIFT 
I I I I I I I I I I I I I I I I I I I i II 1 I I I I I I I I I I I I : I I I I I I : I I I I : I I I I I I I i I I 
orf47ng-l GMRI I S FFTSKRLNVPQI PS PKWVAQASLWLPMLT AILMAHGVMPWLSAAFAFAAGVI FT 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 47-1 . pep VQVYRWWYKPVLKEPMLWILFAGYLFTGLGLIAVGASYFKPAFLNLGVHLIGVGGIGVLT 
I | | | | | II | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I 1 I I I M I I I I I I I 
orf47ng-l VQV YRWW YKP VLKE PMLW I L FAG YLFTGLGL I AVGASYFK PAFLNLGVHL IGVGG I GVLT 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 4 7-1. pep LGMMARTALGHTGNPIYPPPKAVPVAFWLMMAATAVRMVAVFSSGTAYTHSIRTSSVLFA 

I ! I I I I I I 1 I I I I I II MM MM IN II M MM M II MM II I I M I I Ill 

orf47ng-l LGMMARTALGHTGNSIYPPPKAVPVAFWLMMAATAVRMVAVFSSGTAYTHSIRTSSVLFA 

310 320 330 340 350 360 



370 380 
orf 47-1 . pep LALLVYAWKY I PWL IRPRSDGRPGX 
I I I I I I I M II I I I I II I I I I I II 1 
orf47ng-l LAL LV YAWK Y I PWL I RPRS DGR PGX 

370 380 

Furthermore, ORF47ng-l shows significant homology to an ORF from Pseudomonas stutzeri: 

gnl|PID|e246540 (Z73914) ORF396 protein (Pseudomonas stutzeri] Length = 396 
Score - 155 bits (389), Expect = 5e-37 
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Identities - 121/391 (30%), Positives = 169/391 (42%), Gaps - 21/391 (5%) 

Query: 7 P VWAMAFRPFY S LAAL YGAL S VLLWG FG YTGTHE L S G FY WHAHEMIWGYAGLV 59 

P+W +AFRPF+ +LY L++ LW +TG GF WH HEM++G+A + 

Sbjct: 14 PIWRLAFRPFFLAGSLYALLAIPLWVAAWTGLWP — GFQPTGGWLAWHRHEMLFGFAMAI 71 

Query: 60 VIAFLLTAVATWTGQPPTRGGVLVGLTAFWLAARIAAFIPGWGAAASGILGTLFFWYGAV 119 

V FLLTAV TWTGQ G LVGL A WLAAR+ ++ G AA L LF 
Sbjct: 72 VAGFLLTAVQTWTGQTAPSGNRLVGLAAWLAARL-GWLFGLPAAWLAPLDLLFLVALVW 130 

Query: 120 CMALPVIRSQNRRNYVAVFAIFVLGGTHAAFXXXXXXXXXXXXXXXXXXXXXMVSGFIGL 17 9 

MA + + +RNY V + ++ G . +V+ + L 

Sbjct: 131 MMAQMLWAVRQKRNYPIVWLSLMLGADVLILTGLLQGNDALQRQGVLAGLWLVAALMAL 190 

Query: 180 IGMRIISFFTSKRLNVPQIPSP-KWVAQASLWLPMLTAILMAHGV MPWLSAAFAFA 234 

IG R+I FFT + L P W+ A L + A+L A GV PL FA 

Sbjct: 191 IGGRVIPFFTQRGLGKVDAVKPWVWLDVALLVGTGVIALLHAFGVAMRPQPLLGLLFV-A 249 

Query: 235 AGVIFTVQVYRWWYKPVLKEPMLWILFAGYLFTGLGLIAVGASYF-KPAFXXXXXXXXXX 293 

GV +++ RW+ K + K +LW L L+ + + +F A 
Sbjct: 250 IGVGHLLRLMRWYDKGIWKVGLLWSLHVAMLWLWAAFGLALWHFGLLAQSSPSLHALSV 309 

Query: 294 XXXXXXXXXMMARTALGHTGNSIYPPPKAVPVAFWLXXXXXXXXXXXXFSSGTAYTHSIR 353 

M+AR LGHTG + P+AFL FS + 

Sbjct: 310 GSMSGLILAMIARVTLGHTGRPLQLPAGIIG-AFVL FNLGTAARVFLSVAWPVGGLW 365 

Query: 354 T S S VL FALALLVY AWK Y I PW L I R PRS DGR PG 384 

++V + LA +Y W+Y P L+ R DG PG 
Sbjct: 366 LAAVCWTLAFALYVWRYAPMLVAARVDGHPG 396 



Based on this analysis, it is predicted that the proteins from N.meningitidis and K gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 85 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 71 5>: 

1 . . ATGCCGTCTG AAGGTTCAGA CGGCmTCGGT GyCGGGGAAy CAGAAGyGGT 

51 AGCGCATGCC CAATGAGACT TCGTGGGTTT TGAAGCGGGT GTTTTCCAAG 

101 CGTCCCCAGT TGTGGTAACG GTATCCGGTG TCyAArGTCA GCTTGGGyGT 

151 GATGTCGAAa CCGACACCGG CGATGACACC AAGACCyAmG CTGCTGATrC 

201 TGTkGCTTTC GTGATAGGsA GGTTTGyTGG kmksAsyTTG TAyrATwkkG 

251 CCTssCwsTG kAGmGCCkTk CkyTGGTkkA swGrwArTAG TCGTGGTTTy 

301 TkTTyyCACC GAATGAACyT GATGTTTAAC GTGTCCGTAG GCGACGCGCG 

351 CGCCGATATA GGGTTTGAAT TTATCGTTGA GTTTGAAATC GTAAATGGCG 

401 GACAAGCCGA GAGAAGAAAC GGCGTGGAAG CTGCCGTTTC CCTGATGTTT 

451 TGTTTGGGTT TCTTTGTAGT TGTTGTTTAT CTCTTCAGTA ACTTTTTTAG 

501 TAGAAGAATT ACTTTCTTTC CATTTTCTGT AACTGGCATA ATCTGCCGCT 

551 ATTCTCCAGC CGCCGAAATC . . 

This corresponds to the amino acid sequence <SEQ ID 716; ORF67>: 

1 . .MPSEGSDGXG XGEXEXVAHA QXDFVGFEAG VFQASPVWT VSGVXXQLGX 

51 DVETDTGDDT KTXAADXVAF VIGRFXGXXL YXXAXXXXAX XWXXXXSRGF 

101 XXHRMNLMFN VSVGDARADI GFEFIVEFEI VNGGQAERRN GVEAAVSLMF 

151 CLGFFWWY LFSNFFSRRI TFFPFSVTGI ICRYSPAAEI . . 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. gonorrhoeae 

ORF67 shows 51.8% identity over 199 aa overlap with a predicted ORF (ORF67ng) from 
K gonorrhoeae: 
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orf 67 .pep 
orf 67ng 

orf 67 .pep 
orf 67ng 
orf 67 .pep 
orf 67ng 
orf 67 .pep 
orf 67ng 



MPSEGSDGXGXGEXEXVAHAQXDFVGFEAG 
I I I I I I I I 1 II I I I t I I I I I I I I I 
TNFEIAVLSGMTVRVFYCARPAPVNGGRLKMPSEGSDGIGIGESEAVAHAQRGEVGFEAG 
90 100 110 120 130 140 

VFQAS PWVTVSGVXXQLGXDVETDTGDDTKTXAADXVAFVIGRFXGXXLYXXAXXXXAX 

MM:: ::: II 111:11 I : : 
VFQASPWVAVAGVQGQAGRDVYAHARHRAEAQAAAAVAFLIGVFLRMSVRINRNCCVSI 



30 



146 



90 



206 



150 



XWXXXXSRGFXXHRMNLMFNVSVGDARADIGFEFIVEFEIVNGGQAERRNGVEAAVSLMF 

: I : I : : : : I I I I II I : I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I 

TRVGGKSTCYFFSRIDAVSDVSVGDARTDIGFEFWEFEIVNGGQAERRNGVECAVFLMF 266 

CLGFFW WYLFSNFFSRRITFF-PFSVTGIICRYSPAAEI 190 

I II :: I: I: : I : II Mill Mill: 

RLLVFYVKLVAAKSFIILSFQLFYVHGIFIWPFPVTGIIRGDAPAAEWADRHPGVDGM 326 



The ORF67ng nucleotide sequence <SEQ ID 717> is predicted to encode a protein comprising 



amino acid sequence <SEQ ID 718>: 



1 MPSETVGSIV 

51 NRHSHGSGNL 

101 VFYCARPAPV 

151 SPVWAVAGV 

201 NCCVSITRVG 

251 QAERRNGVEC_ 

301 PVTGIIRGDA 

351 IVGNAFGGVG 



NVGVDESVGF 
GRGVWATVLS 
NGGRLKMPSE 
QGQAGRDVYA 
GKSTCYFFSR 
AVFLMFRLLV 



SPPFPSIQHF 

DKFPCGQVRI 

GSDGIGIGES 

HARHRAEAQA_ 

IDAVSDVSVG 

FYVKLVAAKS 



PAAEWADRH PGVDGMRTDV 



YRFHRIHRIR LFRPPGPMQL 
PACAGMTNFE IAVLSGMTVR 
EAVAHAQRGF VGFEAGVFQA 
AAAVAFLIGV FLRMSV RINR 
DARTDIGFEF WEFEIVNGG 
FIILSFQLFY VHGIFIW PF 
SEIIAYRAYF VFAWSGWFRI 



Based on the presence of a several putative transmembrane domains in the gonococcal protein, it 
is predicted that the proteins from Kmeningitidis and K gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 86 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 719> 

1 ATGTTTGCTT TTTTAGAAGC CTTTTTTGTC GAATACGGTT ATGCGGCTGT 

51 TTTTTTTGTA TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAGGATT 

101 TGACCTTGGT AACAGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCAGTCGG TATGCTCGGC GTATTGGTCG GGGACGGCAT 

201 CATGTTCGCC GCCGGACGAA TTTGGGGGCA GArArTCCTA rGGTTCArAC 

251 CTATTGCGsG CATCATGACG CCGrAACGTT ATGAGCAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGTAACTG GGTCTTATTT GTCGCCCGTT TCCTGCCCGG 

351 TTTGAGAACG GCCGTATTTG TTACAGCCGG TATCAGCCGC AAGGTTTCAT 

401 ACTTGCGTTT TATCATTATG GATGGACTGG CCGCA. . . 

This corresponds to the amino acid sequence <SEQ ED 720; ORF78>: 



1 MFAFLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG VLVGDGIM FA AGRIWGQXXL XFXPIAXIMT PXRYEQVQEK 

101 F DKYGNWVLF VARFLPGL RT AVFVTAGISR KVSYLRFIIM DGLAA. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 72 1>: 

1 ATGTTTGCTT TTTTAGAAGC CTTTTTTGTC GAATACGGTT ATGCGGCTGT 

51 TTTTTTTGTA TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAGGATT 

101 TGACCTTGGT AACAGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCAGTCGG TATGCTCGGC GTATTGGTCG GGGACGGCAT 

201 CATGTTCGCC GCCGGACGAA TTTGGGGGCA GAAAATCCTA AGGTTCAAAC 

251 CTATTGCGCG CATCATGACG CCGAAACGTT ATGAGCAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGTAACTG GGTCTTATTT GTCGCCCGTT TCCTGCCCGG 

351 TTTGAGAACG GCCGTATTTG TTACAGCCGG TATCAGCCGC AAGGTTTCAT 

401 ACTTGCGTTT TATCATTATG GATGGACTGG CCGCACTGAT TTCCGTCCCT 

451 ATTTGGATTT ATCTGGGCGA ATACGGTGCG CACAACATCG ATTGGCTGAT 
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501 GGCGAAAATG CACAGCCTGC AATCGGGTAT TTTTGTTATC TTGGGTATAG 

551 GTGCGACCGT TGTCGCTTGG ATTTGGTGGA AAAAACGCCA ACGTATCCAG 

601 TTTTACCGCA GCAAATTGAA AGAAAAGCGG GCGCAACGCA AAGCCGCCAA 

651 GGCAGCCAAA AAAGCCGCGC AAAGCAAACA ATAA 

This corresponds to the amino acid sequence <SEQ ID 722; ORF78-l>: 

1 MFAFLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG VLVGDGIM FA AGRIWGQKIL RFKPIARIMT PKRYEQVQEK 

101 FDKYGNW VLF VARFLPGLRT AVFV TAGISR KVSYLR FIIM DGLAALISVP 

151 IWIYLGEYGA HNIDWLMAKM HSL QSGIFVI LGIGATWAW I WWKKRQRIQ 

201 FYRSKLKEKR AQRKAAKAAK KAAQSKQ* 

Computer analysis of this amino acid sequence predicts several transmembrane domains, and also 
gave the following results: 

Homology with the dedA homologue of Kinfluenzae (accession number P45280) 
ORF78 and the dedA homologue show 58% aa identity in 144aa overlap: 

0rf78: 4 FLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGM — GYTNPHIMFAVGMLGV 61 

FL FF EYGY AV FVL+ICGFGVPIPED+TLV+GGVI+G+ N H+M V M+GV 

DedA: 20 FLIGFFTEYGYWAVLFVLIICGFGVPIPEDITLVSGGVIAGLYPENVNSHLMLLVSMIGV 7 9 

Orf78: 62 LVGDGIMFAAGRIWGQXXLXFXPIAXIMTPXRYEQVQEKFDKYGNWVLFVARFLPGLRTA 121 

L GD M+ GRI+G L F PI I+T R V+EKF +YGN VLFVARFLPGLR 
DedA: 80 LAGDSCMYWLGRIYGTKILRFRPIRRIVTLQRLRMVREKFSQYGNRVLFVARFLPGLRAP 139 

Orf78: 122 VFVTAGISRKVSYLRFIIMDGLAA 145 

+++ +GI+R+VSY+RF+++D AA 
DedA: 140 IYMVSGITRRVSYVRFVLIDFCAA 163 

Homology with a predicted ORF from N.meningitidis (strain A) 

ORF78 shows 93.8% identity over a 145aa overlap with an ORF (ORF78a) from strain A of N. 
meningitidis: 

10 20 30 40 50 * 60 

orf 7 8 . pep MFAFLEAFFVEYG YAAVFFVLVICGFGVPI PEDLTLVTGGVISGMGYTNPH IMFAVGMLG 

I I I : I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 
orf 7 8a MFALLEAFFVEYG YAAVFFVLVICGFGVPI PEDLTLVTGGVISGMGYTNPH IMFAVGMLG 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 7 8 . pep VLVGDGIM FAAGRIWGQXXLXFXPIAXIMTPXRYEQVQEKFDKYGN WVLFVARFLPGLRT 

I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
or f 7 8a VLVGDGIM FAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGNW VLFVARFLPGLRT 

70 80 90 100 110 120 



130 140 
or f 7 8 . pep AVFVT AG I SRKVS YLRFIIMDGLAA 
I II 11 I IN I II Mil I:! MM I I 
orf 7 8a AVFV TAGISRKVSYLR FLIMDGLAALISVPVWI YLGEYGAHNIDWLMAKMHSLQ SGIFIA 

130 140 150 160 170 180 

The complete length ORF78a nucleotide sequence <SEQ ID 723> is: 



1 ATGTTTGCCC TTTTGGAAGC CTTTTTTGTC GAATACGGCT ATGCGGCCGT 

51 GTTTTTCGTT TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAGGATT 

101 TGACCTTGGT AACAGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCAGTCGG TATGCTCGGC GTATTGGTCG GGGACGGCAT 

201 CATGTTCGCC GCCGGACGCA TCTGGGGGCA GAAAATCCTC AAGTTCAAAC 

251 CGATTGCGCG CATCATGACG CCGAAACGTT ACGCACAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGCAACTG GGTGTTATTT GTCGCTCGTT TCCTGCCCGG 

351 TTTGCGGACT GCCGTTTTCG TTACCGCCGG CATCAGCCGC AAAGTATCGT 

401 ATCTGCGCTT TCTGATTATG GACGGGCTTG CCGCGCTGAT TTCCGTGCCC 

451 GTTTGGATTT ACTTGGGCGA GTACGGCGCG CACAACATCG ATTGGCTGAT 



WO 99/24578 



-411- 



PCT/IB98/01665 



501 GGCGAAAATG CACAGCCTGC AATCCGGCAT CTTCATCGCA TTGGGCGTGC 

551 TGGCGGCGGC GCTGGCGTGG TTCTGGTGGC GCAAACGCCG ACATTATCAG 

601 CTTTACCGCG CACAATTGAG CGAAAAACGC GCCAAACGCA AGGCGGAAAA 

651 GGCAGCGAAA AAAGCGGCAC AGAAGCAGCA GTAA 

This encodes a protein having amino acid sequence <SEQ ID 724>: 



1 MFALLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG VLVGDGIM FA AGRIWGQKIL KFKPIARIMT PKRYAQVQEK 

101 FDKYGNW VLF VARFLPGLRT AVFV TAGISR KVSYLR FLIM DGLAALISVP 

151 VWIYLGEYGA HNIDWLMAKM HSLQ SGIFIA LGVLAAALAW F WWRKRRHYQ 

201 LYRAQLSEKR AKRKAEKAAK KAAQKQQ* 

ORF78a and ORF78-1 show 89.0% identity in 227 aa overlap: 

10 20 30 40 50 60 

orf 78a . pep MFALLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 
I I I : I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 78-1 MFAFLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 78a. pep VLVGDGIMFAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGNWVLFVARFLPGLRT 
I I I I I I I I I I I I I I I I I II I : I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 78-1 VLVGDGIMFAAGRIWGQKILRFKPIARIMTPKRYEQVQEKFDKYGNWVLFVARFLPGLRT 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 78a . pep AVFVTAGISRKVSYLRFLIMDGLAALISVPVWIYLGEYGAHNIDWLMAKMHSLQSGIFIA 

I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I: I I I I I I I I I I I I I I I I I I I I I I I I I I I: 
orf 78-1 AVFVTAGISRKVSYLRFIIMDGLAALISVPIWIYLGEYGAHNIDWLMAKMHSLQSGIFVI 

130 140 150 160 170 180 



190 200 210 220 

or f 7 8a . pep LGVLAAALAWFWWRKRRHYQLYRAQLSEKRAKRKAEKAAKKAAQKQQX 
II: I : : : I I : I I : I I : : I : I I : : I : I I I I : I I I I I I II I I I : : I I 
orf 78-1 LG I G AT W AW I WWKKRQR I Q FYRS KLKEKRAQRKAAKAAKKAAQS KQX 

190 200 210 220 



Homology with a predicted ORF from ^.gonorrhoeae 

ORF78 shows 97.4% identity over 38 aa overlap with a predicted ORF (ORF78ng) from N. 
gonorrhoeae: 

orf 78 . pep XXLX FX PI AX IMT PXRYEQVQEKFDKYGNWVLFVARFLPGLRTAVFVTAG I SRKVS YLRF 137 

I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
orf78ng Y P VLFVAR FL PG LRT AVFVTAG I S RKV S YLR F 32 



orf 78. pep IIMDGLAA 



145 



92 



: II I I I 1 1 

orf78ng LIMDGIJ^LISVPVWIYLGEYGAHNIDWLMAKMHSLQSGIFIALGVLAAALAWFWWRKRR 

The ORF78ng nucleotide sequence <SEQ ID 725> is predicted to encode a protein comprising 
amino acid sequence <SEQ ID 726>: 



1 . . YP VLFVARFL PGLRTAVFV T AG I SRKVS YL R FLIMDGLAA LISVPVWI YL 
51 GEYGAHNIDW LMAKMHSL QS GIFIALGVLA AALAWF WWRK RRHYQLYRAQ 
101 LSEKRAKRKA EKAAKKAAQK QQ* 

Further work revealed the complete gonococcal nucleotide sequence <SEQ ID 727>: 



1 atgtttgccc tttTggaagc CTTTTTTGTC GAAtacggCt atgcGGCCGT 

51 GTTTTTCGTT TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAAGATT 

101 TGACCTTGGT AACGGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCGGTCGG TATGCTCGGC GTGTTGGCGG GCGACGGCGT 

201 GATGTTTGCC GCCGGACGCA TCTGGGGGCA GAAAATCCTC AAGTTCAAAC 

251 CGATTGCGCG CATCATGACG CCGAAACGTT ACGCGCAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGCAACTG GGTTCTGTTT GTCGCCCGTT TCCTGCCGGG 
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351 TTTGCGGACT GCCGTTTTCG TTACCGCCGG CATCAGCCGC AAAGTATCGT 

401 ATCTGCGCTT TCTGATTATG GACGGGCTGG CCGCGCTGAT TTCCGTGCCC 

451 GTTTGGATTT ACTTGGGCGA GTACGGCGCG CACAACATCG ATTGGCTGAT 

501 GGCGAAAATG CACAGCCTGC AATCGGGCAT CTTCATCGCA TTGGGCGTGC 

551 TGGCGGCGGC GCTGGCGTGG TTCTGGTGGC GCAAACGCCG ACATTATCAG 

601 CTTTACCGCG CACAATTGAG CGAAAAACGC GCCAAACGCA AGGCGGAAAA 

651 GGCAGCGAAA AAAGCGGCAC AGAAGCAGCA GTAa 

This corresponds to the amino acid sequence <SEQ ID 728; ORF78ng-l>: 

1 MFALLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG VLAGDGVM FA AGRIWGQKIL KFKPIARIMT PKRYAQVQEK 

101 FDKYGNW VLF VARFLPGLRT AVFV TAGISR KVSYL RFLIM PGLAALISVP 

151 VWIYLGEYGA HNIDWLMAKM HSL QSGIFIA LGVLAAALAW F WWRKRRHYQ 

201 LYRAQLSEKR AKRKAEKAAK KAAQKQQ* 

ORF78ng-l and ORF78-1 show 88.1% identity in 227 aa overlap: 

10 20 30 40 50 60 

orf 78-1 . pep MFAFLEAFFVEYGYAAVFHVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 
I I I : I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf78ng-l MFALLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 78-1 . pep VLVGDGIMFAAGRIWGQKILRFKPIARIMTPKRYEQVQEKFDKYGNWVLFVARFLPGLRT 
I I : I I I : I I I I I I I I I I I I I : I I I 1 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf78ng-l VLAGDGVMFAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGNWVLFVARFLPGLRT 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 78-1. pep AVFVTAGISRKVSYLRFIIMDGLAALISVPIWIYLGEYGAHNIDWLMAKMHSLQSGIFVI 

I I I I I I I I I I I I I I I I 1 : I I I I I I 1 1 I I I I : II I I I I I I I I I I I I I I I I 1 I I I I I I I I: 
orf78ng-l AVFVTAGISRKVSYLRFLIMDGLAALISVPVWIYLGEYGAHNIDWLMAKMHSLQSGIFIA 

130 140 150 160 170 180 

190 200 210 220 

orf 78-1 . pep LG I G AT WAW IWWKKRQRI QFYRSKLKEKRAQRKAAKAAKKAAQSKQX 
II: |:::M:||:||:: I : I I : : I : I I I I : I I I I I I I I I I j : : I j 
or f 7 8ng- 1 LGVLAAALAW FW WRKRRH YQL YRAQL S EKRAKRKAE KAAKKAAQKQQX 

190 200 210 220 

Furthermore, orf78ng-l shows homology to the dedA protein from Kinfluenzae: 

sp|P45280|YG29_HAEIN HYPOTHETICAL PROTEIN HI1629 >gi 1 1073983 I pir | | D64133 dedA 
protein (dedA) homolog - Haemophilus influenzae (strain Rd KW20) 
>gi 1 1574476 (U32836) dedA protein (dedA) [Haemophilus influenzae] Length = 212 
Score = 223 bits (563), Expect « 7e-58 

Identities - 108/182 (59%), Positives - 140/182 (76%), Gaps - 2/182 (1%) 



Query: 


5 


LE AFFVE YGYAAVFFVLVI CGFGVP I PE DLTLVTGGV I SGM — G YTNPH IMFAVGMLGVL 


62 




L FF EYGY AV FVL+ ICGFGVP I PE D+TLV+GGVI +G+ N H+M V M+GVL 




Sbjct: 


21 


LIGFFTEYGYWAVLFVLIICGFGVPIPEDITLVSGGVIAGLYPENVNSHLMLLVSMIGVL 


80 


Query: 


63 


AGDGVMFAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGNWVLFVARFLPGLRTAV 


122 






AGD M+ GRI+G KIL+F+PI RI+T +R V+EKF +YGN VLFVARFLPGLR + 




Sbjct: 


81 


AGDSCMYWLGRIYGTKILRFRPIRRIVTLQRLRMVREKFSQYGNRVLFVARFLPGLRAPI 


140 


Query: 


123 


FVTAGISRKVSYLRFLIMDGLAALISVPVWIYLGEYGAHNIDWLMAKMHSLQSGIFIALG 


182 






++ +GI+R+VSY+RF+++D AA+ISVP+WIYLGE GA N+DWL ++ Q I+I +G 




Sbjct: 


141 


YMVSGITRRVSYVRFVLIDFCAAIISVPIWIYLGELGAKNLDWLHTQIQKGQIVIYIFIG 


200 


Query: 


183 


VL 184 
L 




Sbjct: 


201 


YL 202 





WO 99/24578 



-413- 



PCT/IB98/01665 



Based on this analysis, including the presence of putative transmembrane domains, it is predicted 
that these proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful 
antigens for vaccines or diagnostics, or for raising antibodies. 



Example 87 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 729>: 

1 ATGAAAAAAT TATTGGCGGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTCCGCCGCC GGAGTCCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

101 AAGGTATGAA AATAGGCGGC GCGTTCATGA AAATCCACAA CGACGAAGCC 

151 AAACAAGACT TTTTGCTCGG CGGAAGCAGC CCCGTTGCCG ACCGCGTCGA 

201 AGTGCATACC CACATCAACG ACAACGGCGT GATGCGGATG CGCGAAGTCG 

251 AAGGCGGCGT GCCTTTGGAA GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCATG TGATGTTTAT GGGTTTGAAA AAACAATTAA AAGAGGGCGA 

351 TAAAATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCG CAAACCGTCC 

401 AACTGGAAGT CAAAATCGCG CCGATGCCGG CAATGAACCA C... 

This corresponds to the amino acid sequence <SEQ ED 730; ORF79>: 

1 MKKLLAAVMM AGLAGA VSAA GVHVEDGWAR TTVEGMKIGG AFMKIHNDEA 
51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 
101 SYHVMFMGLK KQLKEGDKIP VTLKFKNAKA QTVQLEVKIA PMPAMNH. . 

Further work revealed the complete nucleotide sequence <SEQ ID 73 1>: 

1 ATGAAAAAAT TATTGGCGGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTCCGCCGCC GGAGTCCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

101 AAGGTATGAA AATAGGCGGC GCGTTCATGA AAATCCACAA CGACGAAGCC 

151 AAACAAGACT TTTTGCTCGG CGGAAGCAGC CCCGTTGCCG ACCGCGTCGA 

201 AGTGCATACC CACATCAACG ACAACGGCGT GATGCGGATG CGCGAAGTCG 

251 AAGGCGGCGT GCCTTTGGAA GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCATG TGATGTTTAT GGGTTTGAAA AAACAATTAA AAGAGGGCGA 

351 TAAAATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCG CAAACCGTCC 

401 AACTGGAAGT CAAAATCGCG CCGATGCCGG CAATGAACCA CGGTCATCAC 

451 CACGGCGAAG CGCATCAGCA CTAA 

This corresponds to the amino acid sequence <SEQ ID 732; ORF79-l>: 

1 MKKLLAAVMM AGLAGA VSAA GVHVEDGWAR TTVEGMKIGG AFMKIHNDEA 
51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 
101 SYHVMFMGLK /KQLKEGDKIP VTLKFKNAKA QTVQLEVKIA PMPAMNHGHH 
151 HGEAHQH* 

Computer analysis of this amino acid sequence revealed a putative leader peptide and also gave the 
following results: 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF79 shows 94.6% identity over a 147aa overlap with an ORF (ORF79a) from strain A oiN. 



meningitidis: 



10 20 30 40 50 60 

or f 7 9 . pep MKKLLAAVMMAGLAGA VSAAGVHVEDGWARTTVEGMKIGGAFMKIHNDEAKQDFLLGGSS 

II Mllllllllllll I I 1 I : I I I I I I I I I I I I I I I : I I I I I I I 1 I I I I I I I I I I I I M 
or f 7 9a MKXLLAAVMMAGLAGA VSAAGIHVEDGWARTTVEGMKMGGAFMKIHNDEAKQDFLLGGSS 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 7 9 . pep PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 
I I I I II II I I I II I I t I I I I II I I I I I I I I I 1 I I i M I I M 1 I I I ! I I Mill Mill 
orf 7 9a PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGXKKQLKXGDKIP 

70 80 90 100 110 120 
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130 140 
or f 7 9 . pep VTLKFKNAKAQTVQLEVKIAPMPAMNH 
{ I I I i I 1 I I I 1 I I I I I I I III I I : I 
5 or f 7 9a VTLKFKNAKAQTVQLEVKTAPMSAMDHGHHHGEAHQHX 

130 140 150 

The complete length ORF79a nucleotide sequence <SEQ ID 733> is: 

1 ATGAAANAAC TATTGGCAGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTCCGCCGCC GGAATCCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

10 101 AAGGTATGAA AATGGGCGGC GCGTTCATGA AAATCCACAA CGACGAAGCC 

151 AAACAAGACT TTTTGCTCGG CGGAAGCAGC CCTGTTGCCG ACCGCGTCGA 

201 AGTGCATACC CATATCAATG ATAACGGTGT GATGCGGATG CGCGAAGTCG 

251 AAGGCGGCGT GCCTTTGGAG GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCATG TCATGTTTAT GGGTNTGAAA AAACAATTAA AAGANGGCGA 

15 351 CAAGATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCA CAAACCGTCC 

401 AACTGGAAGT CAAAACCGCG CCGATGTCGG CAATGGACCA CGGTCATCAC 

451 CACGGCGAAG CGCATCAGCA CTAA 

This encodes a protein having amino acid sequence <SEQ ID 734>: 

1 MKXLLAAVMM AGLAGA VSAA GIHVEDGWAR TTVEGMKMGG AFMKIHNDEA 
20 51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 

101 SYHVMFMGXK KQLKXGDKIP VTLKFKNAKA QTVQLEVKTA PMSAMDHGHH 
151 HGEAHQH* 

ORF79a and ORF79-1 show 94.9% identity in 157 aa overlap: 

10 20 30 40 50 60 

25 or f 7 9a . pep MKXLLAAVMMAGLAGAVSAAGIHVEDGWARTTVEGMKMGGAFMKIHNDEAKQDFLLGGSS 

M | | I I I I I I I I 1 1 I I I I i I : I I II I I I I I I I I I I I : I I I I I I I I II I I I I I I I I I I I 1 
orf 79-1 MKKLIjAAVMMAGLAGAVSAAGVHVEDGWARTTVEGMKIGGAFMKIHNDEAKQDFLLGGSS 

10 20 30 40 50 60 

30 70 80 90 100 110 120 

orf 7 9a . pep PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGS YHVMFMGXKKQLKXGDKIP 
I I I || I I I I I I I I I I I 1 I I I I I I I I I I I I II I I I I I I 1 I I I I I I I I I! I I I I I I I I 1 I 
orf 7 9-1 PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 

70 80 90 100 110 120 

35 

130 140 150 

or f 7 9a . pep VTLKFKNAKAQTVQLEVKTAPMSAMDHGHHHGEAHQHX 
I I I I I I I I 1 I I I I I I II I III I I : I II I I I M I i I I 
orf 7 9-1 VTLKFKNAKAQTVQLEVKIAPMPAMNHGHHHGEAHQHX 
40 130 140 150 

Homology with a predicted ORF from N. gonorrhoeae 

ORF79 shows 96.1% identity over 76 aa overlap with a predicted ORF (ORF79ng) from 
N. gonorrhoeae: 

45 or f 7 9 . pep FMKIHNDEAKQDFLLGGSSPVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGS 101 

I I I I I I M II I I : M I I I I I I I I I I I I I I I 
or f 7 9ng INDNGVMRMREVKGGVPLEAKSVTELKPGS 30 

orf 79 .pep YHVMFMGLKKQLKEGDKIPVTLKFKNAKAQTVQLEVKIAPMPAMNH 147 
50 ^ I I I II I I I I I I ! I M I I I I I II I I I I I I II I I I I I I I I I I I I I I 

or f 7 9ng YHVMFMGLKKQLKEGDKIPVTLKFKNAKAQTVQLEVKTAPMSAMNHGHHHGEAHQH 8 6 

An ORF79ng nucleotide sequence <SEQ ID 73 5> was predicted to encode a protein comprising 
amino acid sequence <SEQ ID 736>: 

1 . . INDNGVMRMR EVKGGVPLEA KSVTELKPGS YHVMFMGLKK QLKEGDKIPV 
55 51 TLKFKNAKAQ TVQLEVKTAP MSAMNHGHHH GEAHQH* 

Further work revealed the complete gonococcal DNA sequence <SEQ ID 73 7>: 
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1 ATGAAAAAAT TATTGGCAGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTccgccgCc GGagTccAtG TCGAggACGG CTGGGCGCGc accaCTGtcg 

101 aaggtATgaa aatggGCGGC GCgttCATga aaATCCACAA CGACGaaGcc 

151 atacaaGACt ttgtgcTCgg CGGaagcatg cccgttgccg accgcGTCGA 

5 201 AGTGCAtaca cacATCAACG ACAACGGCGT GATGCGTATG CGCGAAGTCA 

251 AAGGCGGCGT GCCTTTGGAG GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCACG TGATGTTTAT GGGTTTGAAA AAACAACTGA AAGAGGGCGA 

351 CAAGATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCG CAAACCGTCC 

401 AACTGGAAGT CAAAACCGCG CCGATGTCGG CAATGAACCA CGGTCATCAC 

10 451 CACGGCGAAG CGCATCAGCA CTAA 

This corresponds to the amino acid sequence <SEQ ID 738; ORF79ng-l>: 

1 MKKLLAAVMM AGLAGAV SAA GVHVEDGWAR TTVEGMKMGG AFMKIHNDEA 

51 IQDFVLGGSM PVADRVEVHT HINDNGVMRM REVKGGVPLE AKSVTELKPG 

101 SYHVMFMGLK KQLKEGDKIP VTLKFKNAKA QTVQLEVKTA PMSAMNHGHH 

15 151 HGEAHQH* 

ORF79ng-l and ORF79-1 show 95.5% identity in 157 aa overlap: 

10 20 30 40 50 60 

or f 7 9-1 . pep MKKLLAAVMMAGLAGAVSAAGVHVEDGWARTTVEGMKIGGAFMK I HN DEAKQDFLLGGS S 
M I I II M I It I II I I M M I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I llhllll 
20 or f 7 9ng- 1 MKKLLAAVMMAGLAGAVSAAGVHVEDGWARTTVEGMKMGGAFMKIHNDEAIQDFVLGGSM 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 79-1. pep PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMF>IGLKKQLKEGDKIP 
25 I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I M I I I II I I I I I I II I I I I I I I I I I I I 

orf79ng-l PVADRVEVHTHINDNGVMRMREVKGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 

70 80 90 100 110 120 

130 140 150 

30 or f 7 9-1 . pep VTLKFKNAKAQTVQLEVKIAPMPAMNHGHHHGEAHQHX 

I I I II I II II I I I I II I I III I I I I I I I I I I I I I I I 
or f 7 9ng- 1 VTLKFKNAKAQTVQLEVKTAPMSAMNHGHHHGEAHQHX 

130 140 150 

Furthermore, ORF79ng-l shows significant homology to a protein from Aquifex aeolicus: 

35 gi l 2983695 (AE000731) putative protein [Aquifex aeolicus] Length = 151 

Score « 63.6 bits (152), Expect = 6e-10 

Identities = 38/114 (33%) , Positives = 58/114 (50%), Gaps = 1/114 (0%) 

Query: 24 VEDGWARTTVEGMKMGGAFMKIHNDEAIQDFVLGGSMPVADRVEVHTHINDNGVMRMREV 83 
40 J V+ W G M I N+ D+++G +A RVE+H + +N V +M 

Sbjct: 27 VKHPWVMEPPPGPNTTMMGMIIVNEGDEPDYLIGAKTDIAQRVELHKTVIENDVAKMVPQ 86 

Query: 84 KGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIPVTLKFKNAKAQTVQLEV 137 
+ + + K E K YHVM +GLKK++KEGDK+ V L F+ + TV+ V 
45 Sbjct: 87 ER-IEIPPKGKVEFKHHGYHVMIIGLKKRIKEGDKVKVELIFEKSGKITVEAPV 139 

Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF79-1 (15.6kDa) was cloned in the pET vector and expressed in E.coli, as described above. The 
50 products of protein expression and purification were analyzed by SDS-PAGE. Figure 18A shows 
the results of affinity purification of the His-fusion protein. Purified His-fusion protein was used 
to immunise mice, whose sera were used for ELISA (positive result) and FACS analysis (Figure 
1 8B) These experiments confirm that ORF79-1 is a surface-exposed protein, and that it is a useful 



immunogen. 
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Example 88 

The following DNA sequence, believed to be complete, was identified in N.meningitidis<SEQ ID 
739>: 

1 ATGACGGTAA CTGCGGCCGA AGGCGGCAAA GCTGCCAAGG CGTTAAAAAA 

51 ATATCTGATT ACGGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ATCAGCTCGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCGGGGCT 

201 GGGCGTTATC GTTGCCATTG CCGTATTGTT TGTAACCGGA TTGTTTGCCG 

251 CCAACGTATT GGGTCGGCAG ATCCTCGCCG CGTGGGACAG CCTGTTGGGG 

301 CGGATTCCGG TTGTGAAAtC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

351 ATacgTGCTG TCCGACAGCA GCCGTTCGTT TAAAACGCCG GTACTCGTGC 

401 CGTTTCCCCA GCCCGGTATT TGGACGATyG CTTTCGTGTC AGGGCAGGTG 

451 TCGAATGCGG TTAAGGCCGC ATTGCCGAAs GACGGCGATT ATCTTTCCGT 

501 GTATGTTCCG ACCACGCCGA ATCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AsCATTGAAA 

601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 ATTGGCAsGA CCTATGCCGT CTGAAAAGGC GGATTTGCCC GAACAACAAT 

701 AA 

This corresponds to the amino acid sequence <SEQ ID 740; ORF98>: 

1 MTVTAAEGGK AAKALKKYLI TGILVWLPIA VTVWWSYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPGLGVI VAIAVLFVTG LFAANVLGRQ ILAAWDSLLG 

101 RIPWKSIYS SVKKVSEYVL SDSSRSFKTP VLVPFPQPGI WTIAFVSGQV 

151 SNAVKAALPX DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEXLK 

201 YVISLGMVIP DDLPVKTLAX PMPSEKADLP EQQ* 

Further work revealed the complete nucleotide sequence <SEQ ID 74 1>: 

1 ATGACGGAAC nTGCGGCCGA AGGCGGCAAA GCTGCCAArG CGTTAAAAAA 

51 ATATCTGATT ACGGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ATCAGCTCGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCGGGGCT 

201 GGGCGTTATC GTTGCCATTG CCGTATTGTT TGTAACCGGA TTGTTTGCCG 

251 CCAACGTATT GGGTCGGCAG ATCCTCGCCG CGTGGGACAG CCTGTTGGGG 

301 CGGATTCCGG TTGTGAAATC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

351 ATCGCTGCTG TCCGACAGCA GCCGTTCGTT TAAAACGCCG GTACTCGTGC 

401 CGTTTCCCCA GCCCGGTATT TGGACGATTG CTTTCGTGTC AGGGCAGGTG 

451 TCGAATGCGG TTAAGGCCGC ATTGCCGAAG GACGGCGATT ATCTTTCCGT 

501 GTATGTTCCG ACCACGCCGA ATCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AGCATTGAAA 

601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 ATTGGCAGGA CCTATGCCGT CTGAAAAGGC GGATTTGCCC GAACAACAAT 

701 AA 

This corresponds to the amino acid sequence <SEQ ID 742; ORF98-l>: 

1 MTEXAAEGGK AAKALKKYL I TGILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFA ANVLGRQ ILAAWDSLLG 

101 RIPWKSIYS SVKKVSESLL SDSSRSFKTP VLVPFPQPGI WTIAFVSGQV 

151 SNAVKAALPK DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVKTLAG PMPSEKADLP EQQ* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.menin2itidis (strain A) 

ORF98 shows 96.1% identity over a 233aa overlap with an ORF (ORF98a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 98 . pep MTVTAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 
II I II I I I I I I I II I I I I I I M I I I I I I I I I M II M M I I I I I I I I I t I I I I II I I I 
orf 98a MTEPAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 

10 20 30 40 50 60 
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70 80 90 100 110 120 

orf98 pep GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSEYVL 

I I I I I I I i I I I I I M I It I I I i I I I I I I MMII! IMIIIIM] :| 

orf98a gfnipglgvivaiavlfvtglfaanvlgrqilaawdsllgripwksiyssvkkvsxsll 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf98 pep sdssrsfktpvlvpfpqpgiwtiafvsgqvsnavkaalpxdgdylsvyvpttpnptggyy 
I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I M I I 
orf98a SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 

130 140 150 160 170 180 

190 200 210 220 230 

orf 98 . pep IMVKKSDVRELDMSVDEXLKYVISLGMVIPDDLPVKTLAXPMPSEKADLPEQQX 

| I I I I I I I I I I I I I I I I I I I I I II I I I I I I I t I I I I I I I I I I I I I I I I I I I I 
orf 98a IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 

190 200 210 220 230 

The complete length ORF98a nucleotide sequence <SEQ ID 743> is: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



ATGACGGAAC 
ATATCTGATT 
GGGTGGTTTC 
CCGAAGCAAT 
GGGCGTTATC 
CAAACGTATT 
CGGATTCCGG 
NTCGTTGCTG 
CGTTTCCCCA 
TCGAATGCGG 
GTATGTTCCG 
AGAAAAGCGA 
TATGTGATTT 
ATTGGCAGGA 
AA 



CTGCGGCCGA 
ACGGGCATTT 
CTATATCGTT 
GGCGGCCGCA 
GTTGCCATTG 
GGGCCGGCAG 
TTGTGAAGTC 
TCCGACAGCA 
ATCGGGTATT 
TTAAGGCCGC 
ACCACGCCGA 
TGTGCGCGAA 
CGCTGGGTAT 
CCTATGCCGT 



AGGCGGCAAA 
TGGTCTGGCT 
TCCGCGTCCG 
ATATGTTTTG 
CCGTATTGTT 
ATTCTTGCCG 
CATCTATTCG 
GCCGTTCGTT 
TGGACAATCG 
ATTGCCGAAG 
ATCCGACCGG 
CTCGATATGA 
GGTCATCCCT 
CTGAAAAGGC 



GCTGCCAAGG 
GCCGATTGCG 
ATCAGCTCGT 
GGGTTTAATA 
TGTAACCGGA 
CGTGGGACAG 
AGTGTGAAAA 
TAAAACACCA 
CATTCGTGTC 
GACGGCGATT 
CGGTTACTAT 
GCGTGGACGA 
GACGACCTGC 
GGATTTGCCC 



CGTTAAAAAA 
GTAACGGTTT 
CAACCTGCTG 
TCCCGGGGCT 
TTATTTGCCG 
CTTGTTGGGG 
AAGTATCCGA 
GTACTCGTGC 
CGGTCAGGTG 
ATCTTTCCGT 
ATTATGGTAA 
AGCGTTGAAA 
CCGTCAAAAC 
GAACAACAAT 



This encodes a protein having amino acid sequence <SEQ ID 744>: 



1 MTEPAAEGGK AAKALKKYL I TGILVWLPIA VTWW SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFA ANVLGRQ ILAAWDSLLG 

101 RIPWKSIYS SVKKVSXSLL SDSSRSFKTP VLVPFPQSGI WTIAFVSGQV 

151 SNAVKAALPK DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVKTLAG PMPSEKADLP EQQ* 

ORF98a and ORF98-1 show 98.7% identity in 233 aa overlap: 



10 20 30 40 50 60 

or f 98a . pep MTEPAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 

II! I I I I II I I I I II II I I I 11 I I II I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I 
orf 98-1 MTEXAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 98a. pep GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSXSLL 
I I I I I I I II I II I I I II 1 I I I I I I I I I I I I I I I II I I i I I I I I I I 1 I I I I I I I I II III 
orf 98-1 GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSESLL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 98a. pep SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 
I I I I I I I I I I I I II I I I I I 1 I I I I I I I I I M I I I I ! I I I I I I I I I I I I I I I I I I II I I I 
orf 98-1 SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 

130 140 150 160 170 180 

190 200 210 220 230 

orf 98a . pep IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 
I I I 1 1 II I I I I I I I I I I I M I I I I I I I I M I I I I I I I I I I I I 1 I M I M I I I I I 
or f 9 8 - 1 IMVKKS DVRELDMSVDEALKYVI SLGMVI PDDLPVKTLAGPMPSEKADLPEQQX 

190 200 210 220 230 
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Homology with a predicted ORF from N. gonorrhoeae 

ORF98 shows 95.3% identity over a 233 aa overlap with a predicted ORF (ORF98ng) from 
N. gonorrhoeae: 

10 20 30 40 50 60 

orf 98 . pep MTVTAAEGGKAAKALKKYLITGILWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 60 

II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 II 1 1 II 1 1 

orf98ng MTEPAAEGGKAAKALKKYLITGILWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 60 

orf98 pep GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSEYVL 120 

I | | | I I I | I I I I || I I I II I I I II I I I I I I I I I M M I I I I I I I I I I I I I I I I I I I :l 
orf98ng G FN I PGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLXRIPVVKS I YS SVKKVSESLL 120 

orf 98. pep SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPXDGDYLSVYVPTTPNPTGGYY 180 

I II II I III I II I I I I I I IINI M M II I II M III I I II I I I II II I INI MM I 
orf98ng SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPQDGDYLSVYVPTTPNPTGGYY 180 

orf 98 . pep IMVKKS DVRELDMS VDEXLKYVI SLGMVI PDDLPVKTLAXPMPSEKADLPEQQ 233 

I || I I I I I I I I II I I I I I II II II I I I I I I M I I M I I Ml 111:11111 
orf98ng IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPPEKAELPEQQ 233 

The complete length ORF98ng nucleotide sequence <SEQ ED 745> is predicted to encode a protein 
having amino acid sequence <SEQ ID 746>: 

1 MTEPAAEGGK AAKALKKYL I TGILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFA ANVLGRQ ILAAWDSLLX 

101 RIPWKSIYS SVKKVSESLL SDSSRSFKTP VLVPFPQSGI WTIAFVSGQV 

151 SNAVKAALPQ DGDYLSVYVP TTPNPTGGYY IMVKKS DVRE LDMSVDEALK 

201 YVI SLGMVI P DDLPVKTLAG PMPPEKAELP EQQ* 

Further work revealed the complete nucleotide sequence <SEQ ID 747>: 

1 ATGACGGAAC CTGCGGCCGA AGGCGGCAAA GCTGCCAAGG CGTTAAAAAA 

51 ATATCTGATT ACAGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ACCAGCTTGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCCGGGCT 

201 CGGCGTTATT GTTGCCATTG CCGTATTGTT TGTAACCGGA TTATTTGCCG 

251 CAAACGTGTT GGGCCGGCAG ATTCTTGCCG CGTGGGACAG CCTGTTgggg 

301 cggaTTCCGG TTGTCAAATC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

351 ATCGCTGCTG TCCGACAGCA GCCGTTCGTT TAAAACGCCG GTACTCGTGC 

401 CGTTTCCCCA ATCGGGTATT TGGACAATCG CATTCGTGTC CGGTCAGGTG 

451 TCGAATGCGG TTAAGGCCGC ATTGCCGCAG GATGGCGATT ATCTTTCCGT 

501 GTATGTCCCG ACCACGCCCA ACCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AGCGTTGAAA 

601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 ATTGGCAGGA CCTATGCCGC CTGAAAAGGC GGAGTTGCCC GAACAACAAT 

701 AA 

This corresponds to the amino acid sequence <SEQ ID 748; ORF98ng-l>: 

1 MTEPAAEGGK AAKALKKYL I TGILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFA ANVLGRQ ILAAWDSLLG 

101 RIPWKSIYS SVKKVSESLL SDSSRSFKTP VLVPFPQSGI WTIAFVSGQV 

151 SNAVKAALPQ DGDYLSVYVP TTPNPTGGYY IMVKKS DVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVKTLAG PMPPEKAELP EQQ* 

ORF98ng-l and ORF98-1 show 97.9% identity in 233 aa overlap: 

10 20 30 . 40 50 60 

orf 98-1 . pep MTEXAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 
Ml M M M M M M I MMMI M MM M M MM IM M Ml M II MUM MM 
orf98ng-l MTEPAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 98-1 . pep G FN I PGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKS I YS SVKKVSESLL 
llllllllll MMMI MIMMMIMMI I I I I I I I I I I I I I 
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GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSESLL 
70 80 90 100 110 120 

130 140 150 160 170 180 

SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 

t I I ] I I I I I t I 1 J I I I I I I I I I II I I I I I I I I I I I I I I: I I I I I I I I I I I I I I I I I M I 
SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPQDGDYLSVYVPTTPNPTGGYY 

130 140 150 160 170 180 

190 200 210 220 230 

IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 

I M I I II I I I I I It M I I I I I I I I I I I II M I I I I I i I I I I I I 111:111111 
IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPPEKAELPEQQX 
190 200 210 220 230 

Based on this analysis, including the fact that the putative transmembrane domains in the 
gonococcal protein are identical to the sequences in the meningococcal protein, it is predicted that 
the proteins from Kmeningitidis and K. gonorrhoeae, and their epitopes, could be useful antigens 
for vaccines or diagnostics, or for raising antibodies. 



orf98ng-l 

orf 98-1. pep 
orf98ng-l 

orf 98-1. pep 
orf98ng-l 



Example 89 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 749>: 

1 ATgAAAACGG TAGTCTGGAT TGTCGTCCTG TTTGCCGCCG CCGTCGGACT 

51 GGCGCTGGCT TCGGGCATTT ACACCGGCGA CGTGTATATC GTACTCGGAC 

101 AGACCATGCT CAGAATCAAC CTGCACGCCT TTGTGTTAGG TTCGCTGATT 

151 GCCGTCGTGG TGTGGTATTT CTTGTTTAAA TTCATTATCG GsGgTACTCA 

201 ATATCCCCGA AAAGATGCAG CGTTTCGGTT CGGCnCGTAA AGGCCkCAAG 

251 ssCGsGCTTG CCTTGAACAA GGCGGGTTTG GCGTATTTTG AAGGGCGTTT 

301 TGAAAAGGCG GAACTAGAAG CCTCACGCGT GTTGGTCAAC AAAGtAGGCC 

351 G^gAGACAAC CGGACTTTGG CATTGATGCT GrGCGCGCAC GCCGCCGGAC 

401 AGATGGAAAA CATCGAssTG CGCGACCGTT ATCTTGCGGA AATCGCCAAA 

451 CTGCCGGAAA AACAGCAGCT TTCCCGTTAT CTTTTGTTGG CGGAATCGGC 

501 GTTGAACCGG CGCGATTACG AAGCGGCGGA AGCCAATCTT CATGCGGCGG 

551 CGAAGATGAA TGCCAACCTT ACGCGCCTCG TGCGTCTGCA . ATTCGTTAC 

601 GCTTTCGACA GGGGCGACGC GTTGCAGGTT CTGGCAAAAA CCGAAAAACT 

651 TTCCAAGGCG GGCGCGTTGG GCAAATCGGA AATGGAACGG TATCAAAATT 

701 GGGCATATCC GTCGCCAGCT GGCGGATGCT GCCGATGCCG CCGCTTTGAA 

751 AACCTGCCTG AAGCGGATTC CCGACAGCCT CAAAAACGGG GAATTGAGCG 

801 TATCGGTTGC GGAAAAGTAC GAACGTTTGG GACTGTATGC CGATGCGGTC 

851 AAATGGGTCA AACAGCATTA TCCGCAsAAC CGCCGCCCCG AGCTTTTGGA 

901 AGCCTTTGTC GAAAGCGTGC GCTTTTTGGG CGAGCGCGAA CAGCAGAAAG 

951 CCATCGATTT TGCCGATGCT TGGCTGAAAG AACAGCCCGA TAACGCGCTT 

1001 CTGCTGATGT ATCTCGGTCG GCTCGCCTTC GGCCGCAAAC TTTGGGGCAA 

1051 GGCAAAAGGC TACCTTGAAG CGAGCATTGC ATTAAAGCCG AGTATTTCCG 

1101 CGCGTTTGGT TCTAACAAAG GTTTTCGACG AAATCGGAGA ACCGCAGAAG 

1151 GCGGAGGCGC AC... 

This corresponds to the amino acid sequence <SEQ ID 750; ORFIOO: 

1 MKTWWIWL FAAAVGLALA SGIYTGDVYI VLGQTMLRIN LHAFVLGSLI 

51 AVWWYFLFK FIIGVLNIPE KMQRFGSARK GXKXXLALNK AGLAYFEGRF 

101 EKAELEASRV LVNKVGRDNR TLALMLXAHA AGQMENIXXR DRYLAEIAKL 

151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLXIRYA 

201 FDRGDALQVL AKTEKLSKAG ALGKSEMERY QNWAYRRQLA DAADAAALKT 

251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP XNRRPELLEA 

301 FVESVRFLGE REQQKAIDFA DAWLKEQPDN ALLLMYLGRL AFGRKLWGKA 

351 KGYLEASIAL KPSISARLVL TKVFDEIGEP QKAEAH. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 75 1>: 

1 ATGAAAACGG TAGTCTGGAT TGTCGTCCTG TTTGCCGCCG CCGTCGGACT 

51 GGCGCTGGCT TCGGGCATTT ACACCGGCGA CGTGTATATC GTACTCGGAC 

101 AGACCATGCT CAGAATCAAC CTGCACGCCT TTGTGTTAGG TTCGCTGATT 

151 GCCGTCGTGG TGTGGTATTT CTTGTTTAAA TTCATTATCG GCGTACTCAA 
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201 TATCCCCGAA AAGATGCAGC GTTTCGGTTC GGCGCGTAAA GGCCGCAAGG 

251 CCGCGCTTGC CTTGAACAAG GCGGGTTTGG CGTATTTTGA AGGGCGTTTT 

301 GAAAAGGCGG AACTAGAAGC CTCACGCGTG TTGGTCAACA AAGAGGCCGG 

351 AGACAACCGG ACTTTGGCAT TGATGCTGGG CGCGCACGCC GCCGGACAGA 

401 TGGAAAACAT CGAGCTGCGC GACCGTTATC TTGCGGAAAT CGCCAAACTG 

451 CCGGAAAAAC AGCAGCTTTC CCGTTATCTT TTGTTGGCGG AATCGGCGTT 

501 GAACCGGCGC GATTACGAAG CGGCGGAAGC CAATCTTCAT GCGGCGGCGA 

551 AGATGAATGC CAACCTTACG CGCCTCGTGC GTCTGCAACT TCGTTACGCT 

601 TTCGACAGGG GCGACGCGTT GCAGGTTCTG GCAAAAACCG AAAAACTTTC 

651 CAAGGCGGGC GCGTTGGGCA AATCGGAAAT GGAACGGTAT CAAAATTGGG 

701 GATACCGCCG CCAGCTGGCG GATGCTGCCG ATGCCGCCGC TTTGAAAACC 

751 TGCCTGAAGC GGATTCCCGA CAGCCTCAAA AACGGGGAAT TGAGCGTATC 

801 GGTTGCGGAA AAGTACGAAC GTTTGGGACT GTATGCCGAT GCGGTCAAAT 

851 GGGTCAAACA GCATTATCCG CACAACCGCC GCCCCGAGCT TTTGGAAGCC 

901 TTTGTCGAAA GCGTGCGCTT TTTGGGCGAG CGCGAACAGC AGAAAGCCAT 

951 CGATTTTGCC GATGCTTGGC TGAAAGAACA GCCCGATAAC GCGCTTCTGC 

1001 TGATGTATCT CGGTCGGCTC GCCTACGGCC GCAAACTTTG GGGCAAGGCA 

1051 AAAGGCTACC TTGAAGCGAG CATTGCATTA AAGCCGAGTA TTTCCGCGCG 

1101 TTTGGTTCTA GCAAAGGTTT TCGACGAAAT CGGAGAACCG CAGAAGGCGG 

1151 AGGCGCAGCG CAACTTGGTT TTGGAAGCCG TCTCCGATGA CGAACGTCAC 

1201 GCAGCGTTAG AGCAGCATAG CTGA 

This corresponds to the amino acid sequence <SEQ ID 752; ORF100-1>: 



1 MKTWWIWL FAAAVGLALA SGIYTGDVYI VLGQTMLRIN LHAFVLGSLI 

51 AVWWYFLFK FIIGV LNIPE KMQRFGSARK GRKAALALNK AGLAYFEGRF 

101 EKAELEASRV LVNKEAGDNR TLALMLGAHA AGQMENIELR DRYLAEIAKL 

151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLQLRYA 

201 FDRGDALQVL AKTEKLSKAG ALGKSEMERY QNWAYRRQLA DAADAAALKT 

251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP HNRRPELLEA 

301 FVESVRFLGE REQQKAIDFA DAWLKEQPDN ALLLMYLGRL AYGRKLWGKA 

351 KGYLEASIAL KPSISARLVL AKVFDEIGEP QKAEAQRNLV LEAVSDDERH 

401 AALEQHS* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF100 shows 93.5% identity over a 386aa overlap with an ORF (ORFlOOa) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 100 . pep l^TVWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVVVWYFLFk 
I | I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I II I I I I I I I I I I I ! I I I I I I I I 1 I 
orf 100a MKTVWIWLFAAAXGLALASGIXTGDVY I VLGQTMLRINLHAFVLGSLI AVWWYFLFK 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 100 . pep FIIGVLNIPEKMQRFGSARKGXKXXLALNKAGLAYFEGRFEKAELEASRVLVNKVGRDNR 

I I I I I I I I I I I I I I I I II 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I M : Ml 
orf 100a FIIGVLNXPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 100 . pep TLALMLXAHAAGQMENIXXRDRYIAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 

I I 1 I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 100a TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 100 . pep AAAKMNANLTRLVRLXIRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQLA 

I I I I I I M I I I I I I I : I I I i I I I I I I I I M I I I I Mill I I I M I I I II II I II II 
orf 100a AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKXSKAGAXGKSEMERYQNWAYRRQLX 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 100 . pep DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPXNRRPELLEA 

I I I I I I I II 1 I I II I II I M M II I I II II I M M I I M I II M I I ill 

orf 100a DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 

250 260 270 280 290 300 
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310 320 330 340 350 ' 360 

orf 100 . pep FVESVRFLGEREQQKAIDFADAWLKEQPDNALLLMYLGRLAFGRKLWGKAKGYLEASIAL 
I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I i I I : I I I I I I I I I I I I I I I I I I 
or f 1 0 0 a FVE S VRFLGERDQQKAI DFADAWLKEQPDNALLLXYLGRLAYGRKLWGKAKG YLEAS I AL 

310 320 330 340 350 360 



370 380 
orflOO.pep KPSI S ARLVLTKVFDE I GE PQKAEAH 
I I I I I I I I II : I I I I I I I I I I I I I : 
orflOOa KPS I SARLVLAKVFDETGEPQKAEAQRNLVLASVAEENRPS AETHX 

370 380 390 400 

The complete length ORFlOOa nucleotide sequence <SEQ ID 753> is: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 



ATGAAAACGG 
GGCATTGGCG 
AGACCATGCT 
GCCGTCGTGG 
TANCCCCGAA 
CCGCGCTTGC 
GAAAAGGCGG 
GGATAACCGG 
TGGAAAACAT 
CCGGAAAAGC 
GAACCGGCGC 
AGATGAATGC 
TTCGACAGGG 
CAAGGCGGGC 
CATACCGCCG 
TGCCTGAAGC 
GGTTGCGGAA 
GGGTCAAACA 
TTTGTCGAAA 
CGATTTTGCC 
TGANGTATCT 
AAAGGCTACC 
TTTGGTTCTG 
AGGCGCAGCG 
TCCGCCGAAA 



TAGTCTGGAT 
TCGGGCATTN 
CAGAATCAAC 
TGTGGTATTT 
AAGATGCAGC 
TTTGAACAAG 
AACTTGAAGC 
ACTTTGGCAT 
CGAGCTGCGC 
AGCAGCTTTC 
GATTACGAAG 
CAACCTTACG 
GCGACGCGTT 
GCGTNGGGCA 
CCAGCTGNCG 
GGATTCCCGA 
AAGTACGAAC 
GCATTATCCG 
GCGTGCGCTT 
GATGCTTGGC 
CGGTCGGCTC 
TTGAAGCGAG 
GCAAAGGTTT 
CAACTTGGTT 
CCCATTGA 



TGTCGTCCTG 
ACACCGGCGA 
CTGCACGCCT 
CCTGTTCAAA 
GTTTCGGTTC 
GCGGGTTTGG 
CTCGCGCGTA 
TGATGTTGGG 
GACCGTTATC 
CCGTTATCTT 
CGGCGGAAGC 
CGCCTCGTGC 
GCAGGTTCTG 
AATCGGAAAT 
GATGCTGCCG 
CAGCCTCAAA 
GTTTGGGACT 
CACAACCGCC 
TTTGGGCGAA 
TGAAAGAACA 
GCCTACGGCC 
CATTGCATTA 
TTGACGAAAC 
TTGGCAAGCG 



TTTGCCGCCG 
CGTGTATATC 
TTGTGTTAGG 
TTCATCATCG 
GGCGCGTAAA 
CGTATTTTGA 
TTGGGAAACA 
CGCACATGCC 
TTGCGGAAAT 
TTGTTGGCGG 
CAATCTTCAT 
GTCTGCAACT 
GCAAAAACCG 
GGAACGGTAT 
ATGCCGCCGC 
AACGGGGAAT 
GTATGCCGAT 
GACCCGAACT 
CGCGATCAGC 
GCCCGATAAT 
GCAAACTTTG 
AAGCCGAGTA 
CGGAGAACCG 
TTGCCGAGGA 



CNNTCGGGCT 
GTACTCGGAC 
TTCGCTGATT 
GCGTACTCAA 
GGCCGCAAGG 
AGGGCGTTTT 
AAGAGGCGGG 
GCCGGGCAGA 
CGCCAAACTG 
AATCGGCGTT 
GCGGCGGCGA 
TCGTTACGCT 
AAAAANTTTC 
CAAAATTGGG 
TTTGAAAACC 
TGAGCGTATC 
GCGGTCAAAT 
TTTGGAAGCN 
AGAAAGCCAT 
GCGCTTCTGC 
GGGCAAGGCA 
TTTCCGCGCG 
CAGAAGGCGG 
AAACCGNCCT 



This encodes a protein having amino acid sequence <SEQ ID 754>: 



1 MKTWWIWL FAAAXGLALA SGIXTGDVYI VLGQTMLRIN LHAFVLGSLI 

51 AVWWYFLFK FIIGV LNXPE KMQRFGSARK GRKAALALNK AGLAYFEGRF 

101 EKAELEASRV LGNKEAGDNR TLALMLGAHA AGQMENIELR DRYLAEIAKL 

151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLQLRYA 

201 FDRGDALQVL AKTEKXSKAG AXGKSEMERY QNWAYRRQLX DAADAAALKT 

251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP HNRRPELLEA 

301 FVESVRFLGE RDQQKAIDFA DAWLKEQPDN ALLLXYLGRL AYGRKLWGKA 

351 KG YLEAS I AL KPSISARLVL AKVFDETGEP QKAEAQRNLV LASVAEENRP 

401 SAETH* 

ORFlOOa and ORF100-1 show 95.1% identity in 406 aa overlap: 



10 20 30 40 50 60 

orf 100a . pep MKTWWIWLFAAAXGLALASGIXTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I M 
orf 100-1 MKTWWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 100a . pep FIIGVLNXPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 
I I M I I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I M I I I I I I I I I I II 
orf 100-1 FIIGVLNIPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLVNKEAGDNR 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 100a . pep TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 
I I 1 1 I t 1 I I I I I I I 1 I 1 I I I 1 t I I 1 I I I t K I 1 I I I I 1 I I t I I I I I I I I I I I I I I I I t I I I 
orf 100-1 TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 
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130 



140 



150 



160 



170 



180 



10 



15 



20 



25 



190 200 210 220 230 240 

orf 100a . pep AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKXSKAGAXGKSEMERYQNWAYRRQLX 

I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I Mill I I I I I I I I ! I I I I I I I I 
orfl00-l AAAKMN AN LTRLVRLQLRY AFDRG DALQVLAKTEKLSKAGALGKS EMERYQNWAYRRQLA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 100a . pep DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 
I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I 
orf 100-1 DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 100a . pep FVESVRFLGERDQQKAIDFADAWLKEQPDNALLLXYLGRLAYGRKLWGKAKGYLEASIAL 
I I I I I M I I I I : I I I I I II I I I I I I I II I I I I I I II I II I I II I till I I INI II I II 
orfl00-l FVE S VRFLGEREQQKAI DFADAWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEAS I AL 

310 320 330 340 350 360 

370 380 390 400 

orf 100a . pep KPS I SARLVLAKVFDETGEPQKAEAQRNLVLASVAEENRPSA-ETHX 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I : I : : : : i : I I I 
orf 100-1 KPSISARLVLAKVFDEIGEPQKAEAQRNLVLEAVSDDERHAALEQHSX 

370 380 390 400 

Homology with a predicted ORF from N. gonorrhoeae 

ORF100 shows 93.3% identity over a 386 aa overlap with a predicted ORF (ORFlOOng) from 



30 



35 



40 



45 



50 



55 



AT. gonorrhoeae: 

orf 100. pep 

orflOOng 

orf 100. pep 

orflOOng 

orf 100. pep 

orflOOng 

orflOO.pep 

orflOOng 

orflOO.pep 

orflOOng 

orflOO.pep 

orflOOng 

orf 100 .pep 

orflOOng 



MKTWWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 
I I I I I i M I I I II I I i I I I i I I I I I I I I I 1 I I I I I I I I I 9 I I I I I I I I M I I I I I I I I I I 
MKTWWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 

FI IGVLNI PEKMQRFGSARKGXKXXLALNKAGLAYFEGRFEKAELEASRVLVNKVGRDNR 

1111111111:1:1 I I I I I I I I I I I I I I I I I I I I I I I I I I I J I I I I I M : I I I 
FIIGVLNIPENMRRSGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 



60 



60 
120 
120 
180 



TLALMLXAHAAGQMENIXXRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 

I I I ! I I II I I I I I I II I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I 
TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 180 

AAAKMN ANLTRLVRLXIRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQLA 240 

I I I I I I I I I I I I I I I : I I I I I I I I i I I I II I I I I I I I I I I II I I i I I I I I I I I I I I I : I 
AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQMA 240 

DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPXNRRPELLEA 300 
I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I M I 1 I I I I I I t I I J I I II II I III I 
DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 300 

FVESVRFLGEREQQKAIDFADAWLKEQPDNALLLMYLGRLAFGRKLWGKAKGYLEASIAL 360 

I I I I II I I I I I I I I I I I I I I I : M I I I I I I I I I I 1 I I I I I I : I I I I I I I I I I I I I I I I I I 

FVE S VRFLGEREQQKAI DFADS WLKE QPDNALLLMYLGRLAYGRKLWGKAKG YLE AS I AL 360 



KPS I SARLVLTKVFDE IGE PQKAEAH 
I I I I I I M 1 : I I I I I :: Mill: 

KPS I PARLVLAKVFDETAQSQKAEAQRNLVLAS VAGENRPSAETR 



386 



405 



The complete length ORFlOOng nucleotide sequence <SEQ ID 755> is: 



60 



65 



1 

51 
101 
151 
201 
251 
301 
351 
401 



ATGAAAACGG 
GGCGCTGGCT 
AGACCATGCT 
GCCGTCGTGG 
TATCCCCGAA 
CCGCGCTTGC 
GAAAAGGCGG 
AGACAACCGG 
TGGAAAATAT 



TAGTCTGGAT 
TCGGGCATTT 
CAGAATCAAC 
TGTGGTATTT 
AATATGCGGC 
CTTGAATAAG 
AACTCGAAGC 
ACTTTGGCAT 
CGAGCTGCGC 



TGTTGTCCTG 
ACACCGGCGA 
CTGCACGCCT 
CCTGTTTAAA 
GTTCCGGTTC 
GCGGGTTTGG 
CTCTCGAGTG 
TGATGCTGGG 
GACCGTTATC 



TTTGCCGCCG 
CGTGTATATC 
TTGTGTTAGG 
TTCATCATCG 
GGCGCGGAAA 
CGTATTTCGA 
TTGGGCAACA 
CGCGCACGCG 
TTGCGGAAAT 



CCGTCGGACT 
GTACTCGGAC 
TTCGCTGATT 
GCGTACTCAA 
GGCCGCAAGG 
AGGGCGTTTT 
AAGAGGCCGG 
GCAGGACAGA 
CGCCAAACTG 
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H 01 


prr^a a & a a p 


APPAPPTTTP 


OUl 


a zi a hcczczcczc 

HJ\n\* V3 o V»» vj \* 


PATTA PP A AP 


DOl 


ziPBTPA atpp 


PAAPPTTAPP 


0U1 


TTPPATPPP.P 


PPPATPPPTT 


001 






/ Ul 


P ZV T Zi rrfi P P P 


PPAPATPPPP 


TCI 

/ 01 


tppptpa app 


PPATTPPPPA 


oUl 


P PT TCPP.f2 Zi Zi 


A APT APP A AP 


o 01 


PPPTPAA APA 


PP ATT ATPPP 


901 


TTTGTCGAAA 


GCGTGCGCTT 


951 


CGATTTTGCC 


GATTCTTGGC 


1001 


TGATGTATCT 


CGGCCGGCTC 


1051 


AAAGGCTACC 


TTGAAGCGAG 


1101 


TTTGGTGTTG 


GCAAAGGTTT 


1151 


AAGCACAGCG 


CAACTTGGTT 


1201 


TCCGCCGAAA 


CCCGTTGA 


This encodes a 


protein having amino acic 



CCGCTATCTT CTGCTGGCGG AATCGGCGTT 
CGGCGGAAGC CAATCTTCAT GCGGCGGCGA 
CGCCTCGTGC GTCTGCAACT TCGTTACGCC 
GCAGGTTCTG GCAAAAaccG AAAAACTTTC 
AATCGGAAAT GGAACGGTAT CAAAATTGGG 
GATGCTGCCG ATGCCGCCGC TTTGAAAACC 
CAGCCTCAAA AACGGGGAAT TGagcGTATC 
GTTTGGGACT GTATGCCGAT GCGGTCAAAT 
CACAACCGCC GCCCCGAGCT TTTGGAAGCC 
TTTGGGCGAG CGCGAACAGC AGAAAGCCAT 
TGAAAGAACA GCCCGATAAC GCGCTTCTGC 
GCCTACGGCC GCAAACTTTG GGGTAAGGCA 
TATTGCACTG AAGCCGAGTA TTCCGGCGCG 
TTGACGAAAC CGCACAGTCG CAAAAAGCCG 
TTGGCAAGCG TTGCCGGGGA AAACCGCCCT 

sequence <SEQ ID 756>: 



1 MKTWWIWL FAAAVGLALA SGIYTGDVYI VLGQTMLRIN LHAFVLGSLI 

51 AVWWYFLFK FIIGV LNIPE NMRRSGSARK GRKAALALNK AGLAYFEGRF 

101 EKAELEASRV LGNKEAGDNR TLALMLGAHA AGQMENIELR DRYLAEIAKL 

151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLQLRYA 

201 FDRGDALQVL AKTEKLSKAG ALGKSEMERY QNWAYRRQMA DAADAAALKT 

251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP HNRRPELLEA 

301 FVESVRFLGE REQQKAIDFA DSWLKEQPDN ALLLMYLGRL AYGRKLWGKA 

351 KG YLEAS I AL KPSIPARLVL AKVFDETAQS QKAEAQRNLV LASVAGENRP 

401 SAETR* 

ORFlOOng and ORF100-1 show 95.3% identity in 402 aa overlap: 



10 20 30 40 50 60 

orf 100-1 . pep MKTWWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 
I I I I II IIMI II III lllllll II III II til II II II I Ml II I I I II IIIIMI III 
orflOOng MKTWWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAWVWYFLFK 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 100-1 . pep FI IGVLNI PEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLVNKEAGDNR 
I I I I I I I I I I : I : I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 
orflOOng FI IGVLNI PENMRRSGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 100-1 . pep TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 
I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I II I I I I I I I I I I I I I I 
orflOOng TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 100-1 . pep AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQLA 
I I I I I I I I I I I I I I I 1 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 1 I I I I I : I 
orflOOng AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQMA 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 100-1 . pep DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 
I I I I i I I I I I I I I II I I I I I I M I I I 1 M I I I I IE I I I I I I M I 1 I I I I I I I M I I I I I I 
orflOOng DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 

250 260 270 280 290 300 



310 320 330 340 350 360 

or f 100-1 . pep FVESVRFLGEREQQKAIDFADAWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEASIAL 
I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I II I I I I 
orflOOng FVESVRFLGEREQQKAIDFADSWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEASIAL 

310 320 330 340 350 360 

370 380 390 400 

orf 100-1 . pep KPS I SARLVLAKVFDEIGEPQKAEAQRNLVLEAVSDDERHAALEQHSX 

1 I I I I I I I I I I I I I I :: I I I I I I I I I I I : I : - I ' I 
orflOOn KPSI PARLVLAKVFDETAQS QKAE AQRNLVLAS VAGENRPS AETRX 



WO 99/24578 



-424- 



PCT/IB98/01665 



370 380 390 400 

Based on this analysis, including the presence of a putative leader sequence, a putative 
transmembrane domain, and a RGD motif, it is predicted that the proteins from N.meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 90 

The following DNA sequence, believed to be complete, was identified in N.meningitidis <SEQ ID 
757> 

1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

101 TTGATGTGCC GCGCGGCAAT CCCGAGTATG TGCGTCTGTC GGGCATGGCG 

151 GTGCGGCTGT ACCGTTTTAT GTCGCCGTTG GGCTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCCGGCTG GTGGGGCAGC GGCTGGGTAC 

251 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTACCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

351 CTGGTACCGC GTGTTCAACG AAATCCCCGT GCTGCTGATG GTTGCCGCGC 

401 TGTATsTGGT CGTGTTCAAA CCGTTTTGA 

This corresponds to the amino acid sequence <SEQ ID 758; ORF102>: 

1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MAMIDVPRGN PEYVRLSGMA 
51 VRLYRFMSPL GFGAWFGAA IPFAAGWWGS GWVHVKLCLG LMLLAYQLYC 
101 GVLLRRFQDY SNAFSHRWYR VFNEIPVLLM VAALYXWFK PF* 

Further work revealed the complete nucleotide sequence <SEQ ID 759>: 

1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

101 TTGATGTGCC GCGCGGCAAT CCCGAGTATG TGCGTCTGTC GGGCATGGCG 

151 GTGCGGCTGT ACCGTTTTAT GTCGCCGTTG GGCTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCCGGCTG GTGGGGCAGC GGCTGGGTAC 

251 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTACCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

351 CTGGTACCGC GTGTTCAACG AAATCCCCGT GCTGCTGATG GTTGCCGCGC 

401 TGTATCTGGT CGTGTTCAAA CCGTTTTGA 

This corresponds to the amino acid sequence <SEQ ID 760; ORF102-1>: 

1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MAM IDVPRGN PEYVRLSGMA 
51 VRLYRFMSP L GFGAWFGAA IPFAAG WWGS GWVHV KLCLG LMLLAYQLYC 
101 GVLLRRFQDY SNAFSHRWYR VFNE IPVLLM VAALYLWFK P F* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with HP1484 hypothetical integral membrane protein of H. pylori (accession number AE000647) 
ORF102 and HP1484 show 33% aa identity in 143aa overlap: 



orfl02 


3 


FS WFKLFHLFFVI SWFAGLFYLPRI FVNMAMI DVPRGNPEYVRLSGMAVRLYRFMS PLGF 


62 






F W K FH+ VISW A LFYLPR+FV A + V++ +LY F++ 




HP1484 


8 


FLWVKAFHVI AVI S WMAALFYLPRL FVYHAEN AHKKE FVG WQI QEK — KLYS FI AS PAM 


65 


orfl02 


63 


GAWFGAAIPFAAG WWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWY 


119 






G + + + GW+H KL L ++LLAY YC +R + + R+Y 




HP1484 


66 


GFTLITGILMLLIEPTLFKSGGWLHAKLALWLLLAYHFYCKKCMRELEKDPTRRNARFY 


125 


orfl02 


120 


RVFNEIPXXXXXXXXXXXXFKPF 142 








RVFNE P KPF 




HP1484 


126 


RVFNEAPT ILMI LIVI LVWKPF 148 
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Homology with a predicted ORF from N. meningitidis (strain A) 

ORF102 shows 99.3% identity over a 142aa overlap with an ORF (ORF102a) from strain A ofK 
meningitidis: 

10 20 30 40 50 60 

orf 102 . pep MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I 1 I I I I I I I I I I I I I I I 
orf 102a MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 102 . pep GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 
I I I I I I I I I I I I I I I I I I I I i I I I I I I I I II I I I II I I I I I I I II I I I I I I I I I I I I I I I 
orf 102a GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

70 80 r 90 100 110 120 

130 140 
orf 102 .pep V FNE I P VLLMVAAL YX W FK P FX 
I I I I I I I I II I I I I I I I I I I II 
orfl02a V FNE I P VLLMVAAL YLWFKP FX 

130 140 

The complete length ORF 102a nucleotide sequence <SEQ ID 761 > is: 

1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

101 TTGATGTGCC GCGCGGCAAT CCCGAGTATG TGCGTCTGTC GGGCATGGCG 

151 GTGCGGCTGT ACCGTTTTAT GTCGCCGTTG GGCTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCCGGCTG GTGGGGCAGC GGCTGGGTAC 

251 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTACCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

351 CTGGTACCGC GTGTTCAACG AAATCCCCGT GCTGCTGATG GTTGCCGCGC 

401 TGTATCTGGT CGTGTTCAAA CCGTTTTGA 

This encodes a protein having amino acid sequence <SEQ ID 762>: 

1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MAM IDVPRGN PEYVRLSGMA 
51 VRLYRFMSP L GFGAWFGAA IPFAAG WWGS GWVHV KLCLG LMLLAYQLYC 
101 GVLLRRFQDY SNAFSHRWYR VFNE IPVLLM VAALYLWFK P F* 

ORF 102a and ORF 102-1 show complete identity in 142 aa overlap: 

10 20 30 40 50 60 

orf 102a . pep MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 
I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I 
orf 102-1 MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 102a . pep GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 
I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I 
orf 102-1 GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

70 80 90 100 110 120 

130 140 
orf 102a . pep VFNE I PVLLMVAALYLWFKPFX 
I I I I I I I I I I I I I I I I II I I I 1 I 
orf 102-1 VFNE I PVLLMVAALYLWFKPFX 

130 140 

Homology with a predicted ORF from ^gonorrhoeae 

ORF102 shows 97.9% identity over a 142 aa overlap with a predicted ORF (ORF102ng) from N. 



gonorrhoeae: 
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orf 102 . pep MMFSWFKLFHLFFVI SWFAGLFYLPRI FVNMAMI DVPRGNPE YVRLSGMAVRLYRFMS PL 60 

I I I II I I Mill I! I 1 I I Ml II II II Ml I II MM M M M I M til I lllil I! I II 
orfl02ng MMFSWFKLFHLFFVI SWFAGLFYLPRI FVNMAMI DAPRGNPE YVRLSGMAVRLYRFMS PL 60 

or f 1 02 . pep GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 12 0 

M M II I M M II II I I I I I I I I I I I I I I I I I I M I I I M I I I I I I I I I I I I I I I I I I I 
orf!02ng GFGAWFGAAIPFAAGRWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 120 

orf 102. pep VFNEI PVLLMVAALYXWFKPF 142 

I I I I I I I I I I M M I I I I I I I 
orfl02ng VFNEIPVLLMVAALYLWFKPF 142 

The complete length ORF1 02ng nucleotide sequence <SEQ ID 763> is: 

1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

101 TTGATGCGCC GCGCGGCAAT CCCGAGTATG TGCGCCTGTC GGGGATGGCG 

151 GTGCGGTTGT ACCGTTTTAT GTCGCCTTTG GGTTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCcggccg GTGGGGCagc ggctggGTTC 

251 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTATCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

351 CTGGTACCGC GTGTTCAAcg aAATCCCCGT GCTGCTGATG GTTGCCGCGC 

401 TGTATCTGGT CGTGTTCAAA CCGTTTTGA 

This encodes a protein having amino acid sequence <SEQ ED 764>: 

1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MAM IDAPRGN PEYVRLSGMA 
51 VRLYRFMSP L GFGAWFGAA IPFAAG RWGS GWVHVK LCLG LMLLAYQLYC 
101 GVLLRRFQDY SNAFSHRWYR VFNE IPVLLM VAALYLWFK P F* 

ORF102ng and ORF102-1 show 98.6% identity in 142 aa overlap: 

10 20 30 40 50 60 

orf 102-1 . pep MMFSWFKLFHLFFVI SWFAGLFYLPRI FVNMAMI DVPRGNPE YVRLSGMAVRLYRFMS PL 
I I I I I II I I I I II I I I I I I I I I II II I I I I I I I I I : I I I I I I I II I I I I II I I M I I I II 
orfl02ng MMFSWFKLFHLFFVI SWFAGLFYLPRI FVNMAMI DAPRGNPE YVRLSGMAVRLYRFMS PL 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 102-1 . pep GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 
I I I I I I 1 I I I I I I I I I I I I M I I I I I I I i I I I I I I I I I I I II I I I I I I 1 I I I I i I I I I I 
orf!02ng GFGAWFGAAIPFAAGRWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

70 80 90 100 110 120 



130 140 
orf 102-1 . pep VFNEIPVLLMVAALYLWFKPFX 
I II I I I I I I II I I I I 1 I I I I I I I 
o r f 1 0 2 ng VFNE I PVLLMVAALYL WFKP FX 

130 140 

In addition, ORF102ng shows significant homology to a membrane protein from Kpylori: 

gi 1 2314 656 (AE000647) conserved hypothetical integral membrane protein 
[Helicobacter pylori] Length « 148 
Score = 79.2 bits (192), Expect - le-14 

Identities = 50/147 (34%), Positives = 68/147 (46%), Gaps « 13/147 (8%) 



Query: 


3 


FSWFKLFHLFFVI SWFAGLFYLPRI FVNMAMI DAPRGNPEYVRLSGMAVRLYRFMSPLGF 


62 




F W K FH+ VISW A LFYLPR+FV A + V++ +LY F++ 




Sbjct: 


8 


FLWVKAFHVIAVISWMAALFYLPRLFVYHAENAHKKEFVGWQIQEK — KLYSFIASPAM 


65 


Query: 


63 




115 




G + + F +G GW+H KL L ++LLAY YC +R + + 




Sbjct: 


66 


GFTLITGILMLLIEPTLFKSG GWLHAKLALWLLLAYHFYCKKCMRELEKDPTRRN 


121 


Query: 


116 


HRWYRVFNEIPXXXXXXXXXXXXFKPF 142 








R+YRVFNE P KPF 




Sbjct: 


122 


ARFYRVFNEAPTILMILIVILVWKPF 148 





WO 99/24578 



PCT/IB98/01665 



-427- 



Based on this analysis, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 91 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 765>: 



1 ATGGCAAAAA TGATGAAATG GGCGGCTGTT GCGGCGGTCG CGGCGGCAGC 

51 GGTTTGGGGC GGATGGTCTT AACTGAAGCC CGAGCCGCAC GTGCTTGATA 

101 TTACGGAAAC GGTCAGGCGC GGC // 

//.. ATTTCGTTTA CGATTTTGTC CGAACCGGAT ACGCCGATTA AGGCGAAGCT 

51 CGACAGCGTC GACCCCGGGC TGACCACGAT GTCGTCGGGC GGTTACAACA 

101 GCAGTACGGA TACGGCTTCC AATGCGGTCT ACTATTATGC CCGTTCGTTT 

151 GTGCCGAATC CGGACGGCAA ACTCGCCACG GGGATGACGA CGCAGAATAC 

201 GGTTGAAATC GACGGCGTGA AAAATGTGCT GATTATTCCG TCGCTGACCG 

251 TGAAAAATCG CGGCGGCAAG GCGTTTGTGC GCGTGTTGGG TGCGGACGGC 

301 AAGGCGGCGG AACGCGAAAT CCGGACCGGT ATGAGAGACA GTATGAATAC 

351 CGAAGTAAAA AGCGGGTTGA AAGAGGGGGA CAAAGTGGTC ATCTCCGAAA 

401 TAACCGCCGC CGAGCAACAG GAAAGCGGCG AACGCGCCCT AGGCGGCCCG 

451 CCGCGCCGAT AA 



This corresponds to the amino acid sequence <SEQ ID 766; ORF85>: 



251 PIKAKLDSVD PGLTTMSSGG YNSSTDTASN AVYYYARSFV PNPDGKLATG 
301 MTTQNTVEID GVKNVLIIPS LTVKNRGGKA FVRVLGADGK AAEREIRTGM 
351 RDSMNTEVKS GLKEGDKWI SEITAAEQQE SGERALGGPP RR* 



Further work revealed the further partial nucleotide sequence <SEQ ID 767>: 



1 . .GTATCGGTCG GCGCGCAGGC ATCGGGGCAG ATTAAGATAC TTTATGTCAA 

51 ACTCGGGCAA CAGGTTAAAA AGGGCGATTT GATTGCGGAA ATCAATTCGA 

101 CCTCGCAGAC CAATACGCTC AATACGGAAA AATCCAAGTT GGAAACGTAT 

151 CAGGCGAAGC TGGTGTCGGC ACAGATTGCA TTGGGCAGCG CGGAGAAGAA 

201 ATATAAGCGT CAGGCGGCGT TATGGAAGGA AAACGCGACT TCCAAAGAGG 

251 ATTTGGAAAG CGCGCAGGAT GCGTTTGCCG CCGCCAAAGC CAATGTTGCC 

301 GAGCTGAAGG CTTTAATCAG ACAGAGCAAA ATTTCCATCA ATACCGCCGA 

351 GTCGGAATTG GGCTACACGC GCATTACCGC AACGATGGAC GGCACGGTGG 

401 TGGCGATTCT CGTGGAAGAG GGGCAGACTG TGAACGCGGC GCAGTCTACG 

451 CCGACGATTG TCCAATTGGC GAATCTGGAT ATGATGTTGA ACAAAATGCA 

501 GATTGCCGAG GGCGATATTA CCAAGGTGAA GGCGGGGCAG GATATTTCGT 

551 TTACGATTTT GTCCGAACCG GATACGCCGA TTAAGGCGAA GCTCGACAGC 

601 GTCGACCCCG GGCTGACCAC GATGTCGTCG GGCGGTTACA ACAGCAGTAC 

651 GGATACGGCT TCCAATGCGG TCTACTATTA TGCCCGTTCG TTTGTGCCGA 

701 ATCCGGACGG CAAACTCGCC ACGGGGATGA CGACGCAGAA TACGGTTGAA 

751 ATCGACGGCG TGAAAAATGT GCTGATTATT CCGTCGCTGA CCGTGAAAAA 

801 TCGCGGCGGC AAGGCGTTTG TGCGCGTGTT GGGTGCGGAC GGCAAGGCGG 

851 CGGAACGCGA AATCCGGACC GGTATGAGAG ACAGTATGAA TACCGAAGTA 

901 AAAAGCGGGT TGAAAGAGGG GGACAAAGTG GTCATCTCCG AAATAACCGC 

951 CGCCGAGCAA CAGGAAAGCG GCGAACGCGC CCTAGGCGGC CCGCCGCGCC 

1001 GATAA 



This corresponds to the amino acid sequence <SEQ ID 768; ORF85-l>: 



1 ..VSVGAQASGQ IKILYVKLGQ QVKKGDLIAE INSTSQTNTL NTEKSKLETY 

51 QAKLVSAQIA LGSAEKKYKR £AALWKENAT SKEDLESAQD AFAAAKANVA 

101 ELKALIRQSK ISINTAESEL GYTRITATMD GTWAILVEE GQTVNAAQST 

151 PTIVQLANLD MMLNKMQIAE GDITKVKAGQ DISF2TLSEP DTPIKAKLDS 

201 VDPGLTTMSS GGYNSSTDTA SNAVYYYARS FVPNPDGKLA TGMTTQNTVE 

251 IDGVKNVLII PSLTVKNRGG KAFVRVLGAD GKAAEREIRT GMRDSMNTEV 

301 KSGLKEGDKV VISEITAAEQ QESGERALGG PPRR* 



1 



MAKMMKWAAV AAVAAAAVWG GWS.LKPEPH VLDITETVRR G 



51 
101 
151 
201 



I SF2TLSEPDT 



Computer analysis of this amino acid sequence gave the following results: 
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Homoloev with a predicted ORF from N. meningitidis (strain A) 

ORF85 shows 87.8% identity over a 41aa overlap and 99.3% identity over a 153aa overlap with 
an ORF (ORF85a) from strain A of N. meningitidis: 



10 



15 



20 



25 



30 



orf85.pep 
orf85a 

orf 85. pep 
orf85a 

orf 85. pep 
orf85a 

orf85.pep 
orf85a 

orf 85. pep 
orf85a 



10 20 30 40 

MAKMMKWAAVAAVAAAAVWGGWS -LKPE PHVLDITETVRRG 

I I I I I I I I I I I I I I I I I I I I I I I Mill:: I I I I 1 I I I 

MAKMMKWAAVAAVAAAAVWGGWS YLKPEPQAAYITETVRRGDI SRTVSATGE IS PSNLVS 
10 20 30 40 50 60 

// 

80 90 100 

ISFTILSEPDTPIKAKLDSVDPGLTTMSSG 

I I I I I I i I I I I I I I 1 I I i I I I I I II I I I I I 
TIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSSG 
210 ' 220 230 240 250 260 

110 120 130 140 150 160 

G YNS ST DTASNAVY YYARS FVPNPDGKLATGMTTQNTVE I DGVKNVLI I PSLTVKNRGGK 

II M II II lllll 1 I MMII II I M II I II II I II I II Ml I I I 11 II I I I llllllt: 
G YNS S T DTASNAV Y Y YAR S FV PN PDGKLATGMTTQNT VE I DGVKN VL HPS LT VKNRGGR 

270 280 290 300 310 320 

170 180 190 200 210 220 

AFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGGP 
II II II Ml M II I MMMMIMM I I MM I I MMII I I II II I I Ml llilll II 
AFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGGP 
330 340 350 360 370 380 

230 
PRRX 
MM 
PRRX 
390 



The complete length ORF85a nucleotide sequence <SEQ ID 769> is: 
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40 



45 



50 



55 



1 


ATGGCAAAAA 


51 


GGTTTGGGGC 


101 


TTACGGAAAC 


151 


GGGGAGATTT 


201 


GCAGATTAAG 


251 


ATTTGATTGC 


301 


GAAAAATCCA 


351 


TGCATTGGGC 


401 


AGGATGATGC 


451 


GCCGCCGCCA 


501 


CAAAATTTCC 


551 


CCGCAACGAT 


601 


ACTGTGAACG 


651 


GGATATGATG 


701 


TGAAGGCGGG 


751 


CCGATTAAGG 


801 


GTCGGGCGGC 


851 


ATTATGCCCG 


901 


ATGACGACGC 


951 


TATTCCGTCG 


1001 


TGTTGGGTGC 


1051 


AGAGACAGTA 


1101 


AGTGGTCATC 


1151 


GCGCCCTAGG 



TGATGAAATG 
GGATGGTCTT 
GGTCAGGCGC 
CGCCGTCCAA 
AAACTTTATG 
GGAAATCAAT 
AATTGGAAAC 
AGCGCGGAGA 
GACCGCTAAA 
AAGCCAATGT 
ATCAATACCG 
GGACGGCACG 
CGGCGCAGTC 
TTGAACAAAA 
GCAGGATATT 
CGAAGCTCGA 
TACAACAGCA 
TTCGTTTGTG 
AGAATACGGT 
CTGACCGTGA 
AGACGGCAAG 
TGAATACCGA 
TCCGAAATAA 
CGGCCCGCCG 



GGCGGCTGTT 
ATCTGAAGCC 
GGCGACATCA 
CCTGGTATCG 
TCAAACTCGG 
TCGACCTCGC 
GTATCAGGCG 
AGAAATATAA 
GAAGATTTGG 
TGCCGAGCTG 
CCGAGTCGGA 
GTGGTGGCGA 
TACGCCGACG 
TGCAGATTGC 
TCGTTTACGA 
CAGCGTCGAC 
GTACGGATAC 
CCGAATCCGG 
TGAAATCGAC 
AAAATCGCGG 
GCGGCGGAAC 
AGTAAAAAGC 
CCGCCGCCGA 
CGCCGATAA 



GCGGCGGTCG 
CGAGCCGCAG 
GCCGGACGGT 
GTCGGCGCGC 
GCAACAGGTT 
AGACCAATAC 
AAGCTGGTGT 
GCGTCAGGCG 
AAAGCGCACA 
AAGGCTCTAA 
ATTGGGCTAC 
TTCTCGTGGA 
ATTGTCCAAT 
CGAGGGCGAT 
TTTTGTCCGA 
CCCGGGCTGA 
GGCTTCCAAT 
ACGGCAAACT 
GGTGTGAAAA 
CGGCAGGGCG 
GCGAAATCCG 
GGGTTGAAAG 
GCAGCAGGAA 



CGGCGGCAGC 
GCTGCTTATA 
TTCTGCAACA 
AGGCATCGGG 
AAAAAGGGCG 
GCTCAATACG 
CGGCACAGAT 
GCGTTGTGGA 
GGATGCGCTT 
TCAGACAGAG 
ACGCGCATTA 
AGAGGGGCAG 
TGGCGAATCT 
ATTACCAAGG 
ACCGGATACG 
CCACGATGTC 
GCGGTCTACT 
CGCCACGGGG 
ATGTGCTGAT 
TTTGTGCGCG 
GACCGGTATG 
AGGGGGACAA 
AGCGGCGAAC 



This encodes a protein having amino acid sequence <SEQ ID 770>: 



60 



65 



i 

51 
101 
151 
201 
251 
301 



MAKMMKWAAV AAVAAAAVWG 



GEISPSNLVS 
EKSKLETYQA 
AAAKANVAEL 
TVNAAQSTPT 
PIKAKLDSVD 
MTTQNTVEID 



VGAQASGQIK 
KLVSAQIALG 
KALIRQSKIS 
IVQLANLDMM 
PGLTTMSSGG 
GVKNVLIIPS 



GWSYLKPEPQ 
KLYVKLGQQV 
SAEKKYKRQA 
INTAESELGY 
LNKMQIAEGD 
YNSSTDTASN 
LTVKNRGGRA 



AAYITETVRR 
KKGDLIAEIN 
ALWKDDATAK 
TRITATMDGT 
ITKVKAGQDI 
AVYYYARSFV 
FVRVLGADGK 



GDISRTVSAT 
STSQTNTLNT 
EDLESAQDAL 
WAILVEEGQ 
SFTILSEPDT 
PNPDGKLATG 
AAEREIRTGM 
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351 RDSMNTEVKS GLKEGDKWI SEITAAEQQE SGERALGGPP RR* 

ORF85a and ORF85-1 show 98.2% identity in 334 aa overlap: 

30 40 50 60 70 80 

orf 85a . pep PQAAYITETVRRGDISRTVSATGEISPSNLVSVGAQASGQIKKLYVKLGQQVKKGDLIAE 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I i 
or f85-l VSVGAQASGQIKILYVKLGQQVKKGDLIAE 

10 20 30 

• r 

90 100 110 120 130 140 

orf 85a . pep INSTSQTNTLNTEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKDDATAKEDLESAQD 

I I I I ! I I f I I I I I I I I I 1 I I I I I I I I I I I I I I I I I ! I I I t I I I I I I I I : f I I M I I I I 
orf 85-1 INSTSQTNTLNTEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKENATSKEDLESAQD 

40 50 60 70 80 90 

150 160 170 180 190 200 

orf 85a . pep ALAAAKANVAELKALIRQSKISINTAESELGYTRITATMDGTWAILVEEGQTVNAAQST 
I : | | | | | | | | | | I I | I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I 
orf 85-1 AFAAAKANVAELKALIRQSKISINTAESELGYTRITATMDGTWAILVEEGQTVNAAQST 

100 110 120 130 140 150 

210 220 230 240 250 260 

orf 85a . pep PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 
I | | I | | I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 1 I M I 
orf 85-1 PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 

160 170 180 190 200 210 

270 280 290 300 310 320 

orf 85a . pep GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLIIPSLTVKNRGG 
I! I I I MM I Mil I I ! Ill II II IMtl III II I I I I I I I M I I I I I M I II I Mill I 
orf 85-1 GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLIIPSLTVKNRGG 

220 230 240 250 260 270 

330 340 350 360 370 380 

orf 85a . pep RAFVRVLGADGKAAEREIRTGMRDSMNTEVKS GLKEGDKWI SE I TAAEQQESGERALGG 
: M I i I I II I I M M I I M M I I I I M I I M I M I I I M I M M I I M M M M M I M I 
orf 85-1 KAFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGG 

280 290 300 310 320 330 

390 

orf 85a. pep PPRRX 
I I I I I 

orf85-l PPRRX 

Figure 19D shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF85a.. 
Homology with a predicted ORF from N.zonorrhoeae 

ORF85 shows a high degree of identity with a predicted ORF (ORF85ng) from N. gonorrhoeae: 



50 



55 



60 



65 



ORF85 
ORF85ng 

ORF85 

ORF85ng 

ORF85 

ORF85ng 

ORF85 

ORF85ng 

ORF85 

ORF85ng 



1 MAKMMKWAAVAAVAAAAVWGGWS . LKPEPHVLDITETVRRG 40 

II I I I I I I I I I I I I I I I I I I I I I Mill:: IIIMIII 
1 MAKMMKWAAVAAVAAAAVWGGW S Y LK PE PQAAY I T E AVRRG D I S RT VS AT 50 



201 



ISFTILSEPDT 

I I I I I I I I M I 

TVNAAQSTPTIVQLANLDMMLNKMQIAEGDITKVKAGQD ISFTILSEPDT 



251 PIKAKLDSVDPGLTTMSSGGYNSSTDTASNAVYYYARSFVPNPDGKLATG 

M I Ml II I I II III I MMMIIMI M I MIMMIMI 

251 PIKAKLDSVDPGLTTMSSGGYNSSTDTASNAVYYYARSFVPNPDGKLATG 



250 



250 



300 



300 



301 MTTQNTVEIDGVKNVLIIPSLTVKNRGGKAFVRVLGADGKAAEREIRTGM 350 

II Ml MMM Ml IIMIMM MMMMIMM IIIMIII 

301 MTTQNTVE I DGVKNVLLI PS LT VKNRGGKAFVRVLGADGKAVERE IRTGM 350 

152 RDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGGPPRR 393 

M I M I I I I II II II I II II II I M I M I I M II I I I M I II 
351 KDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGGPPRR 393 
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The complete length ORF85ng nucleotide sequence <SEQ ID 771 > is: 

1 ATGGCAAAAA TGATGAAATG GGCGGCTGTT GCGGCGGTCG CGGCGGCaac 

51 GGTTTGGGGC GGATGGTCTT ATCTGAAGCC CGAACCGCAG GCTGCTTATA 

101 TTACGGAaac ggTCAGGCGC GGCGATATCA GCCGGACGGT TTCCGCGACG 

151 GgcgAGATTT CGCCGTCCAA CCTGGTATCG GTCGGCGCGC AGGCTTCGGG 

201 GCAGATTAAA AAGCTTTATG TCAAACTCGG GCAACAGGTC AAAAAGGGCG 

251 ATTTGATTGC GGAAATCAAT TCGACCACGC AGACCAACAC GATCGATATG 

301 GAAAAATCCA AATTGGAAAC GTATCAGGCG AAGCTGGTGT CGGCACAGAT 

351 TGCATTGGGC AGCGCGGAGA AGAAATATAA GCGTCAGGCG GCGTTGTGGA 

401 AGGATGATGC GACCTCTAAA GAAGATTTGG AAAGCGCGCA GGATGCGCTT 

451 GCCGCCGCCA AAGCCAATGT TGCCGAGTTG AAGGCTTTAA TC AG AC AG AG 

501 CAAAATTTCC ATCAATACCG CCGAGTCGGA TTTGGGCTAC ACGCGCATTA 

551 CCGCGACGAT GGACGGCACG GTGGTGGCGA TTCCCGTGGA AGAGGGGCAG 

601 ACTGTGAACG CGGCGCAGTC TACGCCGACG ATTGTCCAAT TGGCGAATCT 

651 GGATATGATG TTGAACAAAA TGCAGATTGC CGAGGGCGAT ATTACCAAGG 

701 TGAAGGCGGG GCAGGATATT TCGTTTACGA TTTTGTCCGA ACCGGATACG 

7 51 CCGATTAAGG CGAAGCTCGA CAGCGTCGAC CCCGGGCTGA CCACGATGTC 

801 GTCGGGCGGC TACAACAGCA GTACGGATAC GGCTTCCAAT GCGGTCTATT 

851 ATTATGCCCG TTCGTTTGTG CCGAATCCGG ACGGCAAACT CGCCACGGGG 

901 ATGACGACGC AGAATACGGT TGAAATCGAC GGTGTGAAAA ATGTGTTGCT 

951 TATTCCGTCG CTGACCGTGA AAAATCGCGG CGGCAAGGCG TTCGTACGCG 

1001 TGTTGGGTGC GGACGGCAAG GCAGTGGAAC GCGAAATCCG GACCGGTATG 

1051 AAAGACAGTA TGAATACCGA AGTGAAAAGC GGGTTGAAAG AGGGGGACAA 

1101 AGTGGTCATC TCCGAAATAA CCGCCGCCGA GCAGCAGGAA AGCGGCGAAC 

1151 GCGCCCTAGG CGGCCCGCCG CGCCGATAA 

This encodes a protein having amino acid sequence <SEQ ID 772>: 

1 MAKMMKWAAV AAVAAA AVWG GWSYLKPEPQ AAYITEAV RR GD ISRTVSAT 

51 GEISPSNLVS VGAQASGQIK KLYVKLGQQV KKGDLIAEIN STTQTNTIDM 

101 EKSKLETYQA KLVSAQIALG SAEKKYKRQA ALWKDDATSK EDLESAQDAL 

151 AAAKANVAEL KALIRQSKIS INTAESDLGY TRITATMDGT WAIPVEEGQ 

201 TVNAAQSTPT IVQLANLDMM LNKMQIAEGD ITKVKAGQDI SFTILSEPDT 

251 PIKAKLDSVD PGLTTMSSGG YNSSTDTASN AVYYYARSFV PNPDGKLATG 

301 MTTQNTVEID GVKNVLLIPS LTVKNRGGKA FVRVLGADGK AVEREIRTGM 

351 KDSMNTEVKS GLKEGDKWI SEITAAEQQE SGERALGGPP RR* 

ORF85ng and ORF85-1 show 96.1% identity in 334 aa overlap: 

30 40 50 60 70 80 

orf85ng PQAAYITETVRRGDISRTVSATGEISPSNLVSVGAQASGQIKKLYVKLGQQVKKGDLIAE 

I I I I I I I I I I I I I I I I I I I I I I I I II I ! I 
orf85-l VSVGAQASGQIKILYVKLGQQVKKGDLIAE 

10 20 30 

90 100 110 120 130 140 

orf85ng INSTTQTNTIDMEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKDDATSKEDLESAQD 

1111:1111:: I I I I I I I I I I I I I I I I I I I I I I I I I I I i 1 I I I I I :: I I I ! i I I I I I I I 
or f 8 5-1 INSTSQTNTLNTEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKENATSKEDLESAQD 

40 50 60 70 80 90 

150 160 170 180 190 200 

orf85ng ALAAAKANVAELKALIRQSKISINTAESDLGYTRITATMDGTWAIPVEEGQTVNAAQST 
I : I I I I I I I ! I I I I I I I I I I I I I I I I I I : I I i I I 1 II I I I I I I I I I I I I II I I I I I I I I 
orf85-l AFAAAKANVAELKALIRQSKISINTAESELGYTRITATMDGTWAILVEEGQTVNAAQST 

100 110 120 130 140 150 

210 220 230 240 250 260 

orf85ng PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 
I I I I I I I I I I I i I I I I I I I I I I I I I I ! i I 1 I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I 
orf85-l PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 

160 170 180 190 200 210 

270 280 290 300 310 320 

orf85ng GG YN S ST DT ASN AV YYYARS FVPN PDGKLATGMTTQNT VE I DGVKNVLL I P SLTVKNRGG 

I I || I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I 
orf85-l GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLIIPSLTVKNRGG 

220 230 240 250 260 270 



330 340 350 360 370 380 
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orf85ng KAFVRVLGADGKAVEREIRTGMKDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGG 
I I I I I I I I I I I I I : I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I 
or f 8 5-1 KAFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGG 

280 290 300 310 320 . 330 

390 

orf85ng PPRRX 
Mill 

orf85-l PPRRX 



10 In addition, ORF85ng shows significant homology to an Exoli membrane fusion protein: 

gi | 1787104 (AE000189) o380; 27% identical (27 gaps) to 332 residues from 
membrane fusion protein precursor, MTRC_NEIGO SW: P43505 (412 aa) [Escherichia 
coli) Length = 380 
Score = 193 bits (485), Expect - 2e-48 
15 Identities - 120/345 (34%), Positives = 182/345 (51%), Gaps = 13/345 (3%) 



20 



Query: 29 PQAAYITETVRRGDISRTVSATGEISPSNLVSVGAQASGQIKKLYVKLGQQVKKGDLIAE 88 

P Y T VR GD+ ++V ATG++ V VGAQ SGQ+K L V +G +VKK L+ 

Sbjct: 41 PVPTYQTLIVRPGDLQQSVLATGKLDALRKVDVGAQVSGQLKTLSVAIGDKVKKDQLLGV 100 

Query: 8 9 INSTTQTNTIDMEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKDDATSKEXXXXXXX 148 

1+ N I ++ L +A+ A+ L A Y RQ L + A S++ 

Sbjct: 101 IDPEQAENQIKEVEATLMELRAQRQQAEAELKLARVTYSRQQRLAQTKAVSQQDLDTAAT 160 



25 Query: 14 9 XXXXXXXXXXXXXXX I RQS K I S I NT AE S DLG YTR I T ATMDGT WAI P VEE GQT VNAAQS T 208 

I++++ S++TA+++L YTRI A M G V I +GQTV AAQ 
Sbjct: 161 EMAVKQAQIGTIDAQIKRNQASLDTAKTNLDYTRIVAPMAGEVTQITTLQGQTVIAAQQA 220 

Query: 209 PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 268 
30 P 1+ LA++ ML K Q++E D+ +K GQ FT+L +P T + ++ VP 

Sbjct: 221 PN I LTLADMS AMLVKAQVSEADVI HLKPGQKAWFTVLGD PLTRYEGQIKDVLP 273 

Query: 269 GGYNS STDTASNAVYYYARS FVPNPDGKLATGMTTQNTVEI DGVKNVLLI PSLTVKNRGG 328 
+ + ++A++YYAR VPNP+G L MT Q +++ VKNVL IP + + G 
35 Sbjct: 274 T PEKVN DAI FYYARFEVPN PNGLLRLDMTAQVH I QLT DVKNVLT I PLS ALGDPVG 328 

Query: 329 KAFVRV-LGADGKAVEREIRTGMKDSMNTEVKSGLKEGDKWISE 372 

+V L +G+ ERE+ G ++ + E+ GL+ GD+WI E 
Sbjct: 329 DNRYKVKLLRNGETRERE VT I GARNDTDVE I VKGLEAGDE W IGE 373 

40 Based on this analysis, it was predicted that the proteins from N.meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF85-1 (40.4kDa) was cloned in the pGex vectors and expressed in Exoli y as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 19A 
shows the results of affinity purification of the GST-fusion protein. Purified GST-fusion protein 
45 was used to immunise mice, whose sera were used for Western blot (Figure 19B), FACS analysis 
(Figure 19C), and ELISA (positive result). These experiments confirm that ORF85-1 is a 
surface-exposed protein, and that it is a useful immunogen. 

Example 92 



The following partial DNA sequence was identified in N.meningitidis <SEQ ID 773>: 

50 1 . .ATTCCCGCCA CGATGACATT TGAACGCAGC GGCAATGCTT ACAAAATCGT 

51 TTCGACGATT AAAGTGCCGC TATACAATAT CCGTTTCGAG TCCGGCGGTA 

101 CGGTTGTCGG CAATACCCTG CACCCTACCT ACTATAGAGA CATACGCAGG 

151 GGCAAACTGT ATGCGGAAgc CAAATTCGCC GACgGcAGCG TAACTTACGG 

201 CAAAGCGGGC GAGAGCAAAA CCGAGCAAAG CCCCAAGGCT ATGGATTTGT 
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251 TCACGCTTGC CTGGCAGTTG GCGGCAAATG ACGCGAAACT CCCCCCGGGG 

301 CTGAAAATCA CCAACGGCAA AAAACTTTAT TCCGTCGGCG GTTTGAATAA 

351 GGCGGGTACA GGAAAATACA GCATAGGCGG CGTGGAAACC GAAGTCGTCA 

401 AATATCGGGT GCGGCGCGGC GACGATGCGG TAATGTATTT cTTCGCACCG 

451 TCCCTGAACA ATATTCCGGC ACAAATCGGC TATACCGACG ACGGCAAAAC 

501 CTATACGCTG AAACTCAAAT CGGTGCAGAT CAACGGCCAG GCAGCCAAAC 

551 CGTAA 

This corresponds to the amino acid sequence <SEQ ID 774; 0RF12O: 



1 . . IPAIMTFERS GNAYKIVSTI KVPLYNIRFE SGGTWGNTL HPTYYRDIRR 

51 GKLYAEAKFA DGSVTYGKAG ESKTEQSPKA MDLFTLAWQL AANDAKLPPG 

101 LKITNGKKLY SVGGLNKAGT GKYSIGGVET EWKYRVRRG DDAVMYFFAP 

151 SLNNIPAQIG YTDDGKTYTL KLKSVQINGQ AAKP* 

Further work revealed the complete nucleotide sequence <SEQ ID 775>: 



1 ATGATGAAGA CTTTTAAAAA TATATTTTCC GCCGCCATTT TGTCCGCCGC 

51 CCTGCCGTGC GCGTATGCGG CAGGGCTGCC CCAATCCGCC GTGCTGCACT 

101 ATTCCGGCAG CTACGGCATT CCCGCCACGA TGACATTTGA ACGCAGCGGC 

151 AATGCTTACA AAATCGTTTC GACGATTAAA GTGCCGCTAT ACAATATCCG 

201 TTTCGAGTCC GGCGGTACGG TTGTCGGCAA TACCCTGCAC CCTACCTACT 

251 ATAGAGACAT ACGCAGGGGC AAACTGTATG CGGAAGCCAA ATTCGCCGAC 

301 GGCAGCGTAA CTTACGGCAA AGCGGGCGAG AGCAAAACCG AGCAAAGCCC 

351 CAAGGCTATG GATTTGTTCA CGCTTGCCTG GCAGTTGGCG GCAAATGACG 

401 CGAAACTCCC CCCGGGGCTG AAAATCACCA ACGGCAAAAA ACTTTATTCC 

451 GTCGGCGGTT TGAATAAGGC GGGTACAGGA AAATACAGCA TAGGCGGCGT 

501 GGAAACCGAA GTCGTCAAAT ATCGGGTGCG GCGCGGCGAC GATGCGGTAA 

551 TGTATTTCTT CGCACCGTCC CTGAACAATA TTCCGGCACA AATCGGCTAT 

601 ACCGACGACG GCAAAACCTA TACGCTGAAA CTCAAATCGG TGCAGATCAA 

651 CGGCCAGGCA GCCAAACCGT AA 

This corresponds to the amino acid sequence <SEQ ID 776; ORF120-1>: 

1 MMKTFKNIFS AAILSAALPC AYA AGLPQSA VLHYSGSYGI PA2MTFERSG 

51 NAYKIVSTIK VPLYNIRFES GGTWGNTLH PTYYRDIRRG KLYAEAKFAD 

101 GSVTYGKAGE SKTEQSPKAM DLFTLAWQLA ANDAKLPPGL KITNGKKLYS 

151 VGGLNKAGTG KYSIGGVETE WKYRVRRGD DAVMYFFAPS LNNIPAQIGY 

201 TDDGKTYTLK LKSVQINGQA AKP* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N .meningitidis (strain A) 

ORF120 shows 92.4% identity over a 184aa overlap with an ORF (ORF120a) from strain A of N. 



meningitidis: 

10 20 30 

orfl20.pep I PATMTFERSGNAYKIVST I KVPLYNIRFE 

MM: I I I I I I I I I I I I I I I I I I 

orfl20a SAAILSAALPCAYAAGLPXSAVLHYSGSYGIPATXXXXXXXNAXKIVST I KVPLYNIRFE 

10 20 30 40 50 60 



40 50 60 70 80 90 

orf 120 . pep SGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAMDLFTLAWQL 

I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I : I I I I I I I I I II I I I I 
orf 120a SGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAXXXXXXQSPKAMDLFTLAWQL 
70 80 90 100 110 120 



100 110 120 130 140 150 

orf 120 . pep AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDAVMYFFAP 
I I I I I ! i I I I I I I I I I I I I I I I I I I I I I II ! M I I I I I I I I I I M E I I !! I I I I I 1 I I II 
orf 120a AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDAVMYFFAP 
130 140 150 160 170 180 
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160 170 180 

orf 120 .pep SLNNI PAQIGYTDDGKTYTLKLKSVQINGQAAKPX 
I I I I I I 1 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 120a SLNNI PAQIGYTDDGKTYTLKLKSVQINGQAAKPX 

190 200 210 220 

The complete length ORF120a nucleotide sequence <SEQ ID 777> is: 



1 ATGATGAAGA CTTTTAAAAA TATATTTTCC GCCGCCATTT TGTCCGCCGC 

51 CCTGCCGTGC GCGTATGCGG CAGGGCTGCC CNAATCCGCC GTGCTGCACT 

101 ATTCCGGCAG CTACGGCATT CCCGCCACNA NNANNTNNGN ACNNNGNGNC 

151 AATGCTTNCA AAATCGTTTC GACGATTAAA GTGCCGCTAT ACAATATCCG 

201 TTTCGAGTCC GGCGGTACGG TTGTCGGCAA TACCCTGCAC CCTACCTACT 

251 ATAGAGACAT ACGCAGGGGC AAACTGTATG CGGAAGCCAA ATTCGCCGAC 

301 GGCAGCGTAA CCTACGGCAA AGCGGNNNNN ANCNNNNNNG NGCAAAGCCC 

351 CAAGGCTATG GATTTGTTCA CGCTTGCNTG GCAGTTGGCG GCAAATGACG 

401 CGAAACTCCC CCCGGGGCTG AAAATCACCA ACGGCAAAAA ACTTTATTCC 

451 GTCGGCGGTT TGAATAAGGC GGGTACAGGA AAATACAGCA TAGGCGGCGT 

501 GGAAACCGAA GTCGTCAAAT ATCGGGTGCG GCGCGGCGAC GATGCGGTAA 

551 TGTATTTCTT CGCACCGTCC CTGAACAATA TTCCGGCACA AATCGGCTAT 

601 ACCGACGACG GCAAAACCTA TACGCTGAAA CTCAAATCGG TGCAGATCAA 

651 CGGCCAGGCA GCCAAACCGT AA 

This encodes a protein having amino acid sequence <SEQ ID 778>: 



1 MMKTFKNIFS AAILSAALPC AYAA GLPXSA VLHYSGSYGI PATXXXXXXX 

51 NAXKIVSTIK VPLYNIRFES GGTWGNTLH PTYYRDIRRG KLYAEAKFAD 

101 GS VTYGKAXX XXXXQSPKAM DLFTLAWQLA ANDAKLPPGL KITNGKKLYS 

151 VGGLNKAGTG KYSIGGVETE WKYRVRRGD DAVMYFFAPS LNNIPAQIGY 

201 TDDGKTYTLK LKSVQINGQA AKP* 

ORF120a and ORF120-1 show 93.3% identity in 223 aa overlap: 



10 20 30 40 50 60 

orf 120a . pep MMKTFKNIFSAAILSAALPCAYAAGLPXSAVLHYSGSYGIPATXXXXXXXNAXKIVSTIK 
II I I I I I I I I I I I I I I I I I I I M I I I I I I I II I I I I I II I I I : II I I I I I I I 

orf 120-1 MMKTFKNIFSAAILSAALPCAYAAGLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIK 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 120a . pep VPLYNIRFESGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAXXXXXXQSPKAM 
I I I I I I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I : I I I I I I 
orf 120-1 VPLYNIRFESGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAM 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 120a . pep DLFTLAWQLAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGD 

I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 120-1 DLFTLAWQLAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGD 

130 140 150 160 170 180 

190 200 210 220 

orf 120a . pep DAVMYFFAP SLNNI PAQIGYTDDGKTYTLKLKSVQINGQAAKPX 

II I I I I I I I 1 1 I I I I M I I II I I II II I INI ! I I I I II II II I 
orf 120-1 DAVMYFFAPS LNN I PAQIGYTDDGKTYTLKLKSVQINGQAAKPX 

190 200 210 220 



Homology with a predicted ORF from N. gonorrhoeae 

ORF120 shows 97.8% identity over 184 aa overlap with a predicted ORF (ORF120ng) from 
N. gonorrhoeae: 



orf 120 .pep IPATMTFERSGNAYKIVSTIKVPLYNIRFE 30 

I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 

orfl20ng S AAI L S AAL PC AYAARLPQSAVLHYSGSYG IPATMTFERSGNAYKIVSTIKVPLYNIRFE 69 

orf 120 . pep SGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAMDLFTLAWQL 90 

I MM Mil Ml: I hlllMII MM III Mill I I II MM II INI II I II llllll 

orfl20ng SGGTWGNTLHPAYYKDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAMDLFTLAWQL 129 
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orfl20.pep AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDAVMYFFAP 150 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I ! I I I I I I I I I I : I I I I I I 
orfl20ng AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDTVTYFFAP 189 

or f 120. pep SLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKP 184 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl20ng SLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKP 223 

The complete length ORF120ng nucleotide sequence <SEQ ID 779> is: 



1 ATGATGAAGA CTTTTAAAAA TATATTTTCC GCCGCCATTT TGTCCGCCGC 

51 CCTGCCGTGC GCGTATGCGG CAAGGCTACC CCAATCCGCC GTGCTGCACT 

101 ATTCCGGCAG CTACGGCATT CCCGCCACGA TGACATTTGA ACGCAGCGGC 

151 AATGCTTACA AAATCGTTTC GACGATTAAA GTGCCGCTAT ACAATATCCG 

201 TTTCGAATCC GGCGGTACGG TTGTCGGCAA TACCCTGCAC CCTGCCTACT 

251 ATAAAGACAT ACGCAGGGGC AAACTGTATG CGGAAGCCAA ATTCGCCGAC 

301 GGCAGCGTAA CCTACGGCAA AGCGGGCGAG AGCAAAACCG AGCAAAGCCC 

351 CAAGGCTATG GATTTGTTCA CGCTTGCCTG GCAGTTGGCG GCAAATGACG 

401 CGAAACTCCC CCCGGGTCTG AAAATCACCA ACGGCAAAAA ACTTTATTCC 

451 GTCGGCGGCC TGAATAAGGC GGGTACGGGA AAATACAGCA TaggCGGCGT 

501 GGAAACCGAA GTCGTCAAAT ATCGGGTGCG GCGCGGCGAC GATACGGTAA 

551 CGTATTTCTT CGCACCGTCC CTGAACAATA TTCCGGCACA AATCGGCTAT 

601 ACCGACGACG GCAAAACCTA TACGCTGAAG CTCAAATCGG TGCAGATCAA 

651 CGGACAGGCC GCCAAACCGT AA 

This encodes a protein having amino acid sequence <SEQ ID 780>: 



1 MMKTFKNIFS AAILSAALPC AYA ARLPQSA VLHYSGSYGI PATMTFERSG 

51 NAYKIVSTIK VPLYNIRFES GGTWGNTLH PAYYKDIRRG KLYAEAKFAD 

101 GSVTYGKAGE SKTEQSPKAM DLFTLAWQLA ANDAKLPPGL KITNGKKLYS 

151 VGGLNKAGTG KYSIGGVETE WKYRVRRGD DTVTYFFAPS LNNIPAQIGY 

201 TDDGKTYTLK LKSVQINGQA AKP* 

In comparison with ORF120-1, ORF120ng shows 97.8% identity in 223 aa overlap: 



10 20 30 40 50 60 

orf 12 0-1. pep MMKTFKNIFSAAILSAALPCAYAAGLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIK 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I 1 I I I M I I I I I 
Orfl20ng MMKTFKNIFSAAILSAALPCAYAARLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIK 

10 20 30 40 50 60 



70 80 90 100 110 120 

or f 12 0-1 . pep VPLYNIRFESGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAM 
I I I I 1 I I I I II I I I I I I I I I I : I I : I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl20ng VPLYNIRFESGGTWGNTLHPAYYKDIRRGKLYAEAKFADGSVTYGKAGE SKTEQSPKAM 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 120-1. pep DLFTLAWQLAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEVVKYRVRRGD 
I I I I I I I I I I I I I 1 I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I 1 I I I I I II I I I I 
orfl20ng DLFTLAWQLAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGD 

130 140 150 160 170 180 



190 200 210 220 

orf 120-1. pep DAVMYFFAPSLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 
1:1 I I I II I I 11 II I I I I I I II M I I M I I I I I I I I I I I I I I 1 
orfl20ng DTVTYFFAPS LNN IPAQ I GYTDDGKTYTLKLKSVQINGQAAKPX 

190 200 210 220 

This analysis, including the presence of a putative leader sequence in the gonococcal protein 
suggests that the proteins from N.meningitidis and N. gonorrhoeae^ and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 93 



The following partial DNA sequence was identified in N.meningitidis <SEQ ID 781>: 
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1 ATGTATCGGA GGAAAGGGCG GGGCATCAAG CCGTGGATGG GTGCCGGTGC 

51 . GCGTTTGCC GCCTTGGTCT GGCTGGTTTT CGCGCTCGGC GATACTTTGA 

101 CTCCGTTTGC GGTTGCGGCG GTGCTGGCGT ATGTATTGGA CCCTTTGGTC 

151 GAATGGTTGC AGAAAAAGGG TTTGAACCGT GCATCCGCTT CGATGTCTGT 

201 GATGGTGTTT TCCTTGATTT TGTTGTTGGC ATTATTGTTG ATTATCGTCC 

251 CTATGCTGGT CGGGCAGTTC AACAATTTGG CATCGCGCCT GCCCCAATTA 

301 ATCGGTTTTA TGCAGAACAC GCTGCTGCCG TGGTTGAAAA ATACAATCGG 

351 CGGATATGTG GAAATCGATC AGGCATCTAT TATTGCGTGG CTTCAGGCGC 

401 ATACGGGAGA GTTGAGCAAC GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 

451 AGGCAGGGCG GCAATATT . . 

This corresponds to the amino acid sequence <SEQ ID 782; ORF121>: 



1 MYRRKGRGIK PWMGAGXAFA ALVWLVFALG DTLTPFAVAA VLAYVLDPLV 
51 EWLQKKGLNR ASASMSVMVF SLILLLALLL IIVPMLVGQF NNLASRLPQL 
101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN ALKAWFPVLM 
151 RQGGNI. . 

Further work revealed the complete nucleotide sequence <SEQ ID 783>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 



ATGTATCGGA 
GGCGTTTGCC 
CTCCGTTTGC 
GAATGGTTGC 
GATGGTGTTT 
CTATGCTGGT 
ATCGGTTTTA 
CGGATATGTG 
ATACGGGAGA 
AGGCAGGGCG 
CTTGCTGCTT 
TTGCCAAACT 
GGCAATTTGA 
AATGCTGATT 
TGGATTCGGG 
CCTTATCTCG 
GCTCCAGTTC 
CCGTAGGACA 
GACCGTATCG 
CGGGCAGCTG 
CCGTAACCTT 
AGTTTTTACC 



GGAAAGGGCG 
GCCTTGGTCT 
GGTTGCGGCG 
AGAAAAAGGG 
TCCTTGATTT 
CGGGCAGTTC 
TGCAGAACAC 
GAAATCGATC 
GTTGAGCAAC 
GCAATATTGT 
TACTATTTCC 
GGTTCCGAgG 
ACGAGGTATT 
ATGGGCTTGG 
GTTTGCCATC 
GGGCGTTTAC 
GGTTCGTGGA 
GTTTCTCGAA 
GGCTGTCGCC 
ATGGGCTTTG 
GGTCTTGCTT 
GGGGCAGGTA 



GGGCATCAAG 
GGCTGGTTTT 
GTGCTGGCGT 
TTTGAACCGT 
TGTTGTTGGC 
AACAATTTGG 
GCTGCTGCCG 
AGGCATCTAT 
GCGCTTAAGG 
CAGCAGTATC 
TGCTGGATTG 
CGTTTTGCCG 
GGGCGAATTT 
TTTACGGTTT 
GGTATGCTTG 
GGGATTGCTG 
ACGGCATCCT 
AGTTTTTTCA 
GTTTTGGGTT 
TCGGAATGTT 
CGCGAGGGCG 
G 



CCGTGGATGG 
CGCGCTCGGC 
ATGTATTGGA 
GCATCCGCTT 
ATTATTGTTG 
CATCGCGCCT 
TGGTTGAAAA 
TATTGCGTGG 
CGTGGTTTCC 
GGCAACCTGC 
GCAGCGGTGG 
GTGCTTATAC 
TTGCGCGGGC 
GGGATTGGTG 
CCGGTATTTT 
CTTGCCACCG 
ATCGGTTTGG 
TTACGCCGAA 
ATCTTTTCGC 
GGCGGGATTG 
TGCAGAAATA 



GTGCCGGTGC 
GATACTTTGA 
CCCTTTGGTC 
CGATGTCTGT 
ATTATCGTCC 
GCCCCAATTA 
ATACAATCGG 
CTTCAGGCGC 
CGTTTTGATG 
TGCTGCTTCC 
TCGTGCGGCA 
GCGCATTACA 
AGCTTCTGGT 
CTGGTCGGGC 
GGTGTTTGTC 
TCGCCGCCTT 
GCGGTTTTTG 
AATCGTGGGA 
TGATGGCGTT 
CCTTTGGCCG 
TTTTGCCGGC 



This corresponds to the amino acid sequence <SEQ ED 784; ORF121-l>: 



1 MYRRKGRGIK PWMGAGAAFA ALVWLVFALG DTL TPFAVAA VLAYVLDPLV 
51 EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 
101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN ALKAWFPVLM 
151 RQGGNIVS SI GNLLLLPLLL YYFLL DWQRW SCGIAKLVPR RFAGAYTRIT 
201 GNLNEVLGEF LRGQL LVMLI MGLVYGLGLV LV GLDSGFAI GMLAGILVFV 
251 PYLGAFTGLL LA TVAALLQF GSWNG ILSVW AVFAVGQFLE SF FITPKIVG 
301 DRIGLSPFWV IFSLMAFGQL MG FVGMLAGL PLAAVTLVLL REGVQKYFAG 
351 SFYRGR* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF121 shows 98.7% identity over a 156aa overlap with an ORF (ORF121a) from strain A of N. 



meningitidis: 

10 20 30 40 50 60 

orf 121 . pep MYRRKGRGIKPWMGAGXAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 
I II It 1 I I I i I I I II I I I I I I I I I II I I I I I II I i I I I I I I I I I I I I I I I H I I M I I 
orf 121a MYRRKGRGIKPWMDAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 

10 20 30 40 50 60 



orf 121 .pep 



70 80 90 100 110 120 

ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 
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I I I II I I III II Mjlltl IMMI1 II II Mill M I I MINI Ml I I I llllllll I 
orfl21a ASASMSVMVFSLILLliALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 

70 80 90 100 110 120 

130 140 150 

orf 121 . pep EIDQAS I IAWLQAHTGELSNALKAWFPVLMRQGGNI 

M M M M M M I M M M M M M I M I I I M M I 
orf 121a EIDQASIIAWLQAHTGELSNALKAWFPVLMRQGGNIVSSIGNLLLLPLLLYYFLLDWQRW 

130 140 150 160 170 180 



orf 121a SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 

190 200 210 220 230 240 

The complete length ORF121a nucleotide sequence <SEQ ID 785> is: 



1 ATGTATCGGA GGAAAGGGCG GGGCATCAAG CCGTGGATGG ATGCCGGTGC 

51 GGCGTTTGCC GCCTTGGTCT GGCTGGTTTT CGCGCTCGGC GATACTTTGA 

101 CTCCGTTTGC GGTTGCGGCG GTGCTGGCGT ATGTATTGGA CCCTTTGGTC 

151 GAATGGTTGC AGAAAAAGGG TTTGAACCGT GCATCCGCTT CGATGTCTGT 

201 GATGGTGTTT TCCTTGATTT TGTTGTTGGC ATTATTGTTG ATTATTGTCC 

251 CTATGCTGGT CGGGCAGTTC AACAATTTGG CATCGCGCCT GCCCCAATTA 

301 ATCGGTTTTA TGCAGAACAC GCTGCTGCCG TGGTTGAAAA ATACAATCGG 

351 CGGATATGTG GAAATCGATC AGGCATCTAT TATTGCGTGG CTTCAGGCGC 

401 ATACGGGCGA GTTGAGCAAC GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 

451 AGGCAGGGCG GCAATATTGT CAGCAGTATC GGCAACCTGC TGCTGCTTCC 

501 CTTGCTGCTT TACTATTTCC TGCTGGATTG GCAGCGGTGG TCGTGCGGCA 

551 TTGCCAAACT GGTTCCGAGG CGTTTTGCCG GTGCTTATAC GCGCATTACA 

601 GGCAATTTGA ACGAGGTATT GGGCGAATTT TTGCGCGGGC AGCTTCTGGT 

651 GATGCTGATT ATGGGTTTGG TTTACGGCTT GGGGTTGGTG CTGGTCGGGC 

701 TGGATTCGGG GTTTGCAATC GGTATGGTTG CCGGTATTTT GGTTTTTGTT 

751 CCCTATTTGG GCGCGTTTAC AGGACTGCTG CTGGCAACCG TCGCCGCCTT 

801 GCTCCAGTTC GGTTCGTGGA ACGGCATCTT GGCTGTTTGG GCGGTTTTTG 

851 CCGTAGGACA GTTTCTCGAA AGTTTTTTCA TTACGCCGAA AATCGTGGGA 

901 GACCGTATCG GCCTGTCGCC GTTTTGGGTT ATCTTTTCGC TGATGGCGTT 

951 CGGGCAGCTG ATGGGCTTTG TCGGAATGTT GGCCGGATTG CCTTTGGCCG 

1001 CCGTAACCTT GGTCTTGCTT CGCGAGGGCG TGCAGAAATA TTTTGCCGGC 

1051 AGTTTTTACC GGGGCAGGTA G 

This encodes a protein having amino acid sequence <SEQ ID 786>: 

1 MYRRKGRGIK PWMDAGAAFA ALVWLVFALG DTL TPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN ALKAWFPVLM 

151 RQGGNIVS SI GNLLLLPLLL YYFLL DWQRW SCGIAKLVPR RFAGAYTRIT 

201 GNLNEVLGEF LRGQL LVMLI MGLVYGLGLV LV GLDSGFAI GMVAG ILVFV 

251 PYLGAFTGLL LA TVAALLQF GSWNG ILAVW AVFAVGQFLE SF FITPKIVG 

301 DRIGLSPFWV IFSLMAFGQL MG FVGMLAGL PLAAVTLVLL REGVQKYFAG 

351 SFYRGR* " 

ORF121a and ORF121-1 show99.2% identity in 356 aa overlap: 

10 20 30 40 50 60 

orf 121a . pep MYRRKGRG I K PWMDAGAAFAALVWLVFALGDTLT P FAVAAVLAYVL D PLVEWLQKKGLNR 
MMMMMIM llllllllllllllllllllllllllllllllllllllllllllll 
orf 121-1 M YRRKGRGI KPWMGAGAAFAALVWLVFALG DTLT P FAVAAVLAYVL D PLVEWLQKKGLNR 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 121a . pep ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 
Ml III IMM Mill II IMMMMI III II III Ml II Mill I I I II lllllll II 
orf 121-1 ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 121a . pep EIDQASIIAWLQAHTGELSNALKAWFPVLMRQGGNIVSSIGNLLLLPLLLYYFLLDWQRW 
I I I I I I I I I t I I I 1 I I 1 I t t I I I I 1 I I I 1 I I I 1 I I 1 I I I I I I I I I I I I I I I I I I I I I I t I 
orf 121-1 EIDQASIIAWLQAHTGELSNALKAWFPVLMRQGGNIVSSIGNLLLLPLLLYYFLLDWQRW 

130 140 150 160 170 180 



190 200 210 220 230 240 

or f 12 la . pep SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 
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I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I t I I I I I I 
orf 121-1 SCGIAKLVPRRFAGAVTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 121a . pep GMVAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILAVWAVFAVGQFLESFFITPKIVG 
I I : II I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I : I I I I II I I I I I I I I I I I I I I I I 
orf 121-1 GMLAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILSVWAVFAVGQFLESFFITPKIVG 

250 260 270 280 290 300 

310 320 330 340 350 

orf 121a . pep DRIGLSPFWVIFSLMAFGQLMGFVGMLAGLPLAAVTLVLLREGVQKYFAGSFYRGRX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 121-1 DRI G LS P FWV I FS LMAFGQLMG FVGMLAGL PLAAVT LVLLRE G VQK Y FAG S FYRGRX 

310 320 330 340 350 

Homology with a predicted ORF from ^gonorrhoeae 

ORF121 shows 97.4% identity over a 156 aa overlap with a predicted ORF (ORF121ng) from 
N. gonorrhoeae: 

orf 121 . pep MYRRKGRGIKPWMGAGXAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 60 

I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
or f 12 lng MYRRKGRG I K PWMGAGAAFAALVWLVYALGDTLT PFAVAAVLAYVLDPLVEWLQKKGLNR 60 

orf 121 . pep ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 120 

I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I II I I I II 
orfl21ng ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 120 

orf 121 . pep E I DQAS I I AWLQAHTGELSNALKAWFPVLMRQGGN I 156 

I Ml! I M I 1:1 I Mill II I I M III I 11:1 Ml I 
orfl21ng E I DQASI I AWFQAHTGELSNALKAWFPVLMKQGGNI VSTIGNLLLPPLLLYYFLLDWHRW 180 

An ORF121ng nucleotide sequence <SEQ ID 787> was predicted to encode a protein having amino 
acid sequence <SEQ ID 788>: 

1 MYRRKGRG IK PWMGAGAAFA ALVWLVYALG DT LTPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW FQAHTGELSN ALKAWFPVLM 

151 KQGGNIVS TI GNLLLPPLLL YYFLL DWHRW SCGIPKLVPR RFAGAYTRIT 

201 GNLNKVWGKF LRGQLLGETE RGAWCRVGR ECWEGGGARS RPSDDGWPRW 

251 GGG* 

Further work revealed the following gonoccocal DNA sequence <SEQ ID 789>: 

1 ATGTATCGGA GAAAAGGACG GGGCATCAAG CCGTGGATGG GTGCCGGCGC 

51 GGCGTTTGCC GCCTTGGTCT GGCTGGTTTA CGCGCTCGGC GATACTTTGA 

101 CTCCGTTTGC GGTTGCGGCG GTGCTGGCGT ATGTGTTGGA CCCTTTGGTC 

151 GAATGGTTGC AGAAAAAGGG TTTGAACCGT GCATCCGCTT CGATGTCTGT 

201 GATGGTGTTT TCCTTGATTT TGTTGTTGGC ATTATTGTTG ATTATTGTCC 

251 CTATGCTGGT CGGGCAGTTC AATAATTTGG CATCTCGCCT GCCCCAATTA 

301 ATCGGTTTTA TGCAGAACAC GCTGCTGCCG TGGTTGAAAA ATACAATCGG 

351 CGGATATGTG GAAATCGATC AGGCATCTAT TATTGCGTGG TTTCAGGCGC 

401 ATACGGGCGA GTTGAGCAAC GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 

451 AAACAGGGCG GCAATATTGT CAGCAGTATC GGCAACCTGC TGCTGCCGCC 

501 CTTGCTGCTT TACTATTTCC TGCTGGATTG GCAGCGGTGG TCGTGCGGCA 

551 TCGCCAAACT GGTTCCGAGG CGTTTTGCCG GTGCTTATAC GCGCATTACG 

601 GGTAATTTGA ACGAGGTATT GGGCGAATTT TTGCGCGGTC AGCTTCTGGT 

651 GATGCTGATT ATGGGCTTGG TTTACGGTTT GGGATTGATG CTAGTCGGAC 

701 TGGATTCGGG ATTTGCCATC GGTATGGTTG CCGGTATTTT GGTGTTTGTC 

751 CCCTATTTGG GTGCGTTTAC GGGATTGCTG CTTGCCACTG TTGCAGCCTT 

801 GCTCCAGTTC GGTTCGTGGA ACGGAATCTT GGCTGTTTGG GCGGTTTTTG 

851 CCGTCGGTCA GTTTCTCGAA AGTTTTTTCA TTACGCCGAA AATTGTAGGA 

901 GACCGTATCG GCCTGTCGCC GTTTTGGGTT ATCTTTTCGC TGATGGCGTT 

951 CGGAGAGCTG ATGGGCTTTG TCGGAATGTT GGCCGGATTG CCTTTGGCCG 

1001 CCGTAACCTT GGTCTTGCTT CGCGAGGGCG CGCAGAAATA TTTTGCCGGC 

1051 AGTTTTTACC GGGGCAGGTA G 
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This corresponds to the amino acid sequence <SEQ ID 790; ORF121ng-l>: 

1 MYRRKGRGIK PWMGAGAAFA ALVWLVYALG DTL TPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNIiASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW FQAHTGELSN ALKAWFPVLM 

151 KQGGNIVS SI GNLLLPPLLL YYFLL DWQRW SCGIAKLVPR RFAGAYTRIT 

201 GNLNEVLGEF LRGQL LVMLI MGLVYGLGLM LV GLDSGFAI GMVAG ILVFV 

251 PYLGAFTGLL LA TVAALLQF GSWNG ILAVW AVFAVGQFLE SF FITPKIVG 

301 DRIGLSPFWV IFSLMAFGEL MG FVGMLAGL PLAAVTLVLL REGAQKYFAG 

351 SFYRGR* 



10 ORF121ng-l and ORF121-1 show 97.5% identity in 356 aa overlap: 



15 



20 



orf 121-1. pep 
orfl21ng-l 

orf 121-1 .pep 
orfl21ng-l 



10 20 30 40 50 60 

MYRRKGRG I KPWMGAGAAFAAL VWLVFALGDT LT P FAVAAVLAYVL D P LVEWLQKKGLNR 

I I I I I I I I II I I I I I ( I I I I I I i I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
MYRRKGRG I KPWMGAGAAFAALVWLVYALGDTLTP FAVAAVLAYVL DP LVEWLQKKGLNR 
10 20 30 40 50 60 

70 80 90 100 110 120 

ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 
I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I 1 I I I I I I I I I I I II I I I I I I I I I I 
ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 

70 80 90 100 110 120 



25 



30 



35 



40 



45 



orf 121-1 .pep 
orfl21ng-l 



orf 121-1. pep 
orfl21ng-l 

orf 121-1. pep 
orfl21ng-l 



orf 121-1. pep 
orfl21ng-l 



130 140 150 160 170 180 

EI DQAS 1 1 AWLQAHTGELSNALKAWFPVLMRQGGNIVSSIGNLLLLPLLL YYFLLDWQRW 

I I I I I I I I I I : I I I I I I ! I I I I I I I I II I I : I I I I I I I I I I I I I I I I I I I I I II I I I I I 
EIDQASIIAWFQAHTGELSNALKAWFPVLMKQGGNIVSSIGNLLLPPLLLYYFLLDWQRW 
130 140 150 160 170 180 

190 200 210 220 230 240 

SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 
! I I I j I I! I I II II II I MM I I M I Ml I I ! II II M I ! I I I ! I I I I 1:11 I 1 !M II I 
SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLMLVGLDSGFAI 

190 200 210 220 230 240 

250 260 270 280 290 300 

GMLAG I LVFVP YLGAFTGLLLAT VAALLQFGS WNGI LS VWAVFAVGQFLE SFFITPKIVG 
I I : I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I 1 I I I I I I I I I I 
GMVAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILAVWAVFAVGQFLESFFITPKIVG 

250 260 270 280 290 300 

310 320 330 340 350 

DRIGLSPFWVIFSLMAFGQLMGFVGMLAGLPLAAVTLVLLREGVQKYFAGSFYRGRX 
I I I I I I I I i I I I I I I I I I: I I I I I M I I I I I I I I I I I I 1 II I ! : I I I I I I I I I I I I I 
DRIGLS PFWVI FSLMAFGELMGFVGMLAGLPLAAVTLVLLREGAQKYFAGSFYRGRX 

310 320 330 340 350 



50 



55 



60 



65 



In addition, ORF121ng-l shows homology to a permease from H.influenzae: 

sp|P43969|PERM_HAEIN PUTATIVE PERMEASE PERM HOMOLOG Length - 349 
Score = 69.9 bits (168) , Expect « 2e-ll 

Identities = 67/317 (21%), Positives = 120/317 (37%) , Gaps = 7/317 



(2%) 



Query: 26 VYALGDTLTPFAVAAVLAYVLDPLVEWL-QKKGLNRASASMSVMVFSXXXXXXXXXXXVP 84 

+Y GD + P +A VL+Y+L+ + +L Q R A++ + VP 

Sbjct: 32 IYFFGDLIAPLLIALVLSYLLEIPINFLNQYLKCPRMLATILIFGSFIGLAAVFFLVLVP 91 

Query: 85 MLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYVE-IDQASIIAWFQAHTGELSNALK 143 
ML Q +L S LP + N WL N YEID+++F+ ++ + 

92 MLWNQTISLLSDLPAMF NKSNEWLLNLPKNYPELIDYSMVDSIFNSVREKILGFGE 147 

144 AWFPVLMKQGGNIVSSIGNXXXXXXXXXXXXXDWQRWSCGIAKLVPRRFAGAYTRITGNL 203 

+ + + N+VS D G+++ +P+ A+ R + 

148 SAVKLSLASIMNLVSLGIYAFLVPLMMFFMLKDKSELLQGVSRFLPKNRNLAFXRWK-EM 206 



Sbjct: 
Query: 



Sbjct: 

Query: 204 NEVLGEFLRGQXXXXXXXXXXXXXXXXXXXXDSGFAIGMVAGILVFVPYXXXXXXXXXXX 263 

+ + ++ G+ + + G+ V VPY 

Sbjct: 207 QQQISNYIHGKLLEILIVTLITYIIFLIFGLNYPLLLAFAVGLSVLVPYIGAVIVTIPVA 266 
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Query: 264 XXXXXQFGSWNGILAVWAVFAVGQFLESFFITPKIVGDRIGLSPFWVIFSLMAFGELMGF 323 

QFG + FAV QL+ +P+++LP+I S++ FG L GF 

Sbjct: 267 LVALFQFGI S PTFWYI 1 I AFAVSQLLDGNLLVPYLFSEAVNLH PL 1 1 1 ISVLI FGGLWGF 326 

Query: 324 VGMLAGLPLAAVTLVLL 340 

G+ +PLA + ++ 
Sbjct: 327 WGVFFAI PLATLVKAVI 343 

Based on this analysis, including the presence of a putative leader sequence and transmembrane 
domains in the two proteins, it is predicted that the proteins from Kmeningitidis and 
A ^gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 94 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 79 1>: 

1 . .ACTGCTTTTT CGGCGGCGCT GCGCTTGAGT CCATCATGAC TCGTCATATT 

51 TTTGTCCTTT GGGAAACCGT ATCAACAAAC AGCCGCCATC TTAACATTTT 

101 TTTGCACGTC CTGCCCGCCG CGTTCAAATG CGTACCAGCA ATACCGCCGC 

151 CTGCGCCTCT ATGCCTTCCA TCCGCCCGAG ATAGCCGAGT TTTTCGTTGG 

201 TTTTGCCTTT GATGTTGACG CACGAAATGT CTATGCCCAA ATCGGCGGCG 

251 ATGTTGGCAC GCATTTGCGG AATGTGCGGC GCGAGTGTGG GTTTCTGTGC 

301 AATCACGGTC GTATCGACAT TGACCGCCTG CCAACCCTGC GCCTGAACGC 

351 TTTGATACGC CGCACGCAAA AGGACGCGGC TGTCCGCATC TTTGAACTCT 

401 GCGGCGGTGT CGGGGAAATG GCTGCCGATA TCGCCCAAAC CTGCCGCACC 

451 GAGCAGCGCG TCGGTAACGG CGTGCAGCAG CGCATCGGCA TCGGAGTGTC 

501 CGAGCAGCCC TTTTTCAAAT GGGATTTCAA CTCCGCCAAG TATCAG. . 

This corresponds to the amino acid sequence <SEQ ID 792; ORF122>: 



1 . . TAFSAALRLS PSXLVIFLSF GKPYQQTAAI LTFFCTSCPP RSNAYQQYRR 

51 LRLYAFHPPE IAEFFVGFAF DVDARNVYAQ IGGDVGTHLR NVRRECGFLC 

101 NHGRIDIDRL PTLRLNALIR RTQKDAAVRI FELCGGVGEM AADIAQTCRT 

151 EQRVGNGVQQ RIGIGVSEQP FFKWDFNSAK YQ. . 

Further work revealed the complete nucleotide sequence <SEQ ID 793>: 



1 ATATCGTACT GGGCAAGCAG TTCGCCGGAT TTTTTGGAAG TAGATACCGC 

51 GCCTTTGATT TTTTTGCCGC TCTTACCCAA GGCTTCGATG AAAAAGTTGA 

101 TGGTCGAGCC GGTACCGATG CCGATATATT CATTTTCGGG TACGAATTCG 

151 ACTGCTTTTT CGGCGGCGAT GCGCTTGAGT TCGTCTTGTG TCGTCATATT 

201 TTTGTCCTTT GGGAAACCGT ATCAACAAAC AGCCGCCATC TTAACATTTT 

251 TTTGCACGTC CTGCCCGCCG CGTTCAAATG CGTACCAGCA ATACCGCCGC 

301 CTGCGCCTCT ATGCCTTCCA TCCGCCCGAG ATAGCCGAGT TTTTCGTTGG 

351 TTTTGCCTTT GATGTTGACG CACGAAATGT CTATGCCCAA ATCGGCGGCG 

401 ATGTTGGCAC GCATTTGCGG AATGTGCGGC GCGAGTTTGG GTTTCTGTGC 

451 AATCACGGTC GTATCGACAT TGACCGCCTG CCAACCCTGC GCCTGAACGC 

501 TTTGATACGC CGCACGCAAA AGGACGCGGC TGTCCGCATC TTTGAACTCT 

551 GCGGCGGTGT CGGGGAAATG GCTGCCGATA TCGCCCAAAC CTGCCGCACC 

601 GAGCAGCGCG TCGGTAACGG CGTGCAGCAG CGCATCGGCA TCGGAGTGTC 

651 CGAGCAGCCC TTTTTCAAAT GGGATTTCAA CTCCGCCAAG TATCAGCTTT 

701 CTGCCTTCGG TCAGTTGGTG GACATCGTAG CCCTGTCCGA TACGGATGTT 

751 CGTCATCGTT TGTGTTCCTG A 

This corresponds to the amino acid sequence <SEQ ID 794; ORF122-l>: 



1 ISYWASSSPD FLEVDTAPLI FLPLLPKASM KKLMVEPVPM PIYSFSGTNS 

51 T AFSAAMRLS SSCWIFL SF GKPYQQTAAI LTFFCTSCPP RSNAYQQYRR 

101 LRLYAFHPPE IAEFFVGFAF DVDARNVYAQ IGGDVGTHLR NVRREFGFLC 

151 NHGRIDIDRL PTLRLNALIR RTQKDAAVRI FELCGGVGEM AADIAQTCRT 

201 EQRVGNGVQQ RIGIGVSEQP FFKWDFNSAK YQLSAFGQLV DIVALSDTDV 

251 RHRLCS* 

Computer analysis of this amino acid sequence gave the following results: 



WO 99/24578 



-440- 



PCT/IB98/01665 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF122 shows 94.0% identity over a 182aa overlap with an ORF (ORF122a) from strain A of N. 
meningitidis: 

10 20 30 

orf 122 .pep TAFS AALRLSPSXLVI FLS FGKPYQQTAAI 

I I I 1 I I : I I I I : I I I I I I I I I I I I I I I I 
orf 122a FLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLS SSCWI FLS FGKPYQQTAAI 

30 40 50 60 70 80 

40 50 60 70 80 90 

orf 122 . pep LTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAFDVDARNVYAQIGGDVGTHLR 
I I I I I I I I I I I I I I I I I I I I I I I I I 111:11111111 I I I I I I I I I I I I I I t I I I I 
orf 122a LTFFXTSCPPRSNPYQQYRRLRLYAFHAPEITEFFVGFAFXVDARNVYAQIGGDVGTHLR 

90 100 110 120 130 140 



100 110 120 130 140 150 

orf 122 . pep NVRRECGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRIFELCGGVGEMAADIAQTCRT 
I : [ I I I II I I I II I I I I I i I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 122a NMRREFGFLCNHGRI DI DRLPTLRLNALIRRTQKDAAVRI FELCGGVGEMAADIAQTCRT 

150 160 170 180 190 200 



160 170 180 

orf 122 . pep EQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQ 
I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 
orf 122a EQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLVDIVALSDTDVRHRLCSX 

210 220 230 240 250 

The complete length ORF 122a nucleotide sequence <SEQ ID 795> is: 



1 AT AT CAT ATT GGGCAAGCAG TTCACTGGAT TTTTTGGAAG TAGATACCGC 

51 GCCTTTGATT TTTTTGCCGC TCTTACCCAA GGCTTCGATG AAAAAGTTGA 

101 TGGTCGAACC GGTACCGATG CCGATGTATT CGTTTTCGGG TACGAATTCG 

151 ACTGCNTTTT CGGCGGCGAT GCGCTTGAGT TCGTCTTGTG TCGTCATATT 

201 TTTGTCCTTT GGGAAACCGT ATCAACAAAC AGCCGCCATC TTAACATTTT 

251 TTNNNACGTC CTGCCCGCCG CGTTCAAATC CTTACCAGCA ATACCGCCGC 

301 CTGCGACTCT ATGCCTTCCA TGCGCCCGAG ATAACCGAGT TTTTCGTTGG 

351 TTTTGCCTTT GANGTTGACG CACGAAATGT CTATGCCCAA ATCGGCGGCG 

401 ATGTTGGCAC GCATTTGCGG AATATGCGGC GCGAGTTTGG GTTTCTGTGC 

451 AATCACGGTC GTATCGACAT TGACCGCCTG CCAACCCTGC GCCTGAACGC 

501 TTTGATACGC CGCACGCAAA AGGACGCGGC TGTCCGCATC TTTGAACTCT 

551 GCGGCGGTGT CGGGGAAATG GCTGCCGATA TCGCCCAAAC CTGCCGCACC 

601 GAGCAGCGCG TCGGTAACGG CGTGCAGCAG CGCATCGGCA TCGGAGTGTC 

651 CGAGCAGCCC TTTTTCAAAT GGGATTTCAA CTCCGCCAAG TATCAGCTTT 

701 CTGCCTTCGG TCAGTTGGTG GACATCGTAG CCCTGTCCGA TACGGATGTT 

751 CGTCATCGTT TGTGTTCCTG A 

This encodes a protein having amino acid sequence <SEQ ID 796>: 



1 ISYWASSSLD FLEVDTAPLI FLPLLPKASM KKLMVEPVPM PMYSFSGTNS 

51 T AFSAAMRLS SSCWIFL SF GKPYQQTAAI LTFFXTSCPP RSNPYQQYRR 

101 LRLYAFHAPE ITEFFVGFAF XVDARNVYAQ IGGDVGTHLR NMRREFGFLC 

151 NHGRIDIDRL PTLRLNALIR RTQKDAAVRI FELCGGVGEM AADIAQTCRT 

201 EQRVGNGVQQ RIGIGVSEQP FFKWDFNSAK YQLSAFGQLV DIVALSDTDV 

251 RHRLCS * 

ORF122a and ORF122-1 show 96.9% identity in 256 aa overlap: 



10 20 30 40 50 60 

orf 122a . pep ISYWASSSLDFLEVDTAPLIFLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLS 
MINIM I I 1 I t I I 1 1 1 1 1 E 1 t I t 1 I I K I I I I t t I I I I I : I 1 I I I 1 I I I I I I I I I 1 1 I 
orf 122-1 ISYWASSSPDFLEVDTAPLIFLPLLPKASMKKLMVEPVPMPIYSFSGTNSTAFSAAMRLS 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 122a . pep SSCWI FLS FGKPYQQTAAI LTFFXTSCPPRSNPYQQYRRLRLYAFHAPE ITEFFVGFAF 
IMMMIMMMIMMIMM I I I I I I I I MINIUM Ml Mhlillllll 
orf 122-1 SSCWI FLS FGKPYQQTAAI LTFFCTSCPPRSNAYQQYRRLRLYAFHPPE I AEFFVGFAF 
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70 



80 



90 



100 



110 



120 



130 140 150 160 170 180 

orf 122a . pep XVDARNVYAQIGGDVGTHLRNMRREFGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRI 
I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 122-1 DVDARNVYAQIGGDVGTHLRNVRREFGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRI 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 122a . pep FELCGGVGEMAADIAQTCRTEQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLV 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 122-1 FELCGGVGEMAADIAQTCRTEQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLV 

190 200 210 220 230 240 

250 

orf 122a . pep DIVALSDTDVRHRLCSX 
I I I I I I I I I i I ! I I I I I 
orfl22-l D I VALS DT DVRHRLCSX 

250 



Homology with a predicted ORF from N. gonorrhoeae 

ORF122 shows 89.6% identity over a 182 aa overlap with a predicted ORF (ORF122ng) from 



TAFSAALRLS PSXLVI FLS FGKP YQQTAAI 
I i I I I I : I I I I : I I I I I I I I I I t I I I I I 
FLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLSSSCWI FLS FGKP YQQTAAI 



N. gonorrhoeae: 

orf 122. pep 
orfl22ng 
orfl22.pep 
orfl22ng 
orf 122. pep 
orfl22ng 
orf 122. pep 
orfl22ng 

The complete length ORF122ng nucleotide sequence <SEQ ID 797> is: 



30 



80 



90 



LTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAFDVDARNVYAQIGGDVGTHLR 
Mill!! Mill MIIMMMMIMIiMIMMMMMM: Mill MIMM 
LTFFCTSWPPRSNPYQQYRRLRLYAFHPPEIAEFFVGFAFDIDARNIDTQIGGDVGTHLR 140 

NVRRECGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRIFELCGGVGEMAADIAQTCRT 150 
III I IIMMMMMMMIIMMIIMIMMMIIMMIMMIMMIMM 
NVRCEFGFLCNHGRIDIDHLPTLRLNALIRRTQKDAAVRIFELCGGVGKMAADVAQTCRT 200 

EQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQ 182 
Ml IMMMIMI : II I I I I II I I I I I I I 

EQRVGNGVQQRVGIRMPEQPFFKWDFNSAKYQLSAFGQLVDIVALSDTDIRHRLCS 256 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 



ATGTCGTACC 
GCCTTTGATT 
tgGTCGAACC 
ACTGCTTTTT 
TTTAtccttt 
TTTGCACGtC 
ctgcgcctCT 
TTTTGCCTTT 
ATGTTGGCAC 
AATCACGGTC 
TTTGATACGC 
GCGGCGGTGT 
GAGCAGCgcg 
CGAGCAGCCC 
CTGCCTTCGG 
CGTCATCGTT 



GGGCAAGCAG 
TTTTTACCGC 
GgtaCCGATG 
CGGCGGCGAT 
gGGAAaccct 
ctggccgccg 
AtgcCTTCCA 
GATatTGACG 
GCATTTGCGG 
GTATCGACAT 
CGCACGCAAA 
CGGGAAAATG 
tcggtaaCGG 
TTTTTCAAAT 
TCAATTGGTG 
TGTGTTCCTG 



TTCGCCGGAT 
TTTTGCCCAA 
CCGATGTATT 
GCGCttgAgt 
atcaAcaAAc 
cgttcaAATc 
TCCGCCCGAG 
CACGAAATAT 
AATGTGCGGT 
TGACCACCTG 
AGGACGCGGC 
GCTGCCGATG 
CGTGCAGCAG 
GGGATTTCAA 
GACATCGTAG 
A 



TTTTTGGAGG 
GGCTTCGATG 
CGTTTTCGGG 
TCgtcttgcg 
agccgccatC 
cgtaccaGca 
ATAGCCGAGT 
CGatacCCAa 
GCGAGTTTGG 
CCAACCCTGC 
TGTCCGCATC 
TCGCCCAAAC 
cgcgTcgGCA 
CTCCGCCAAG 
CCCTGTCCGA 



TTGAAACCGC 
AAGAAATTGa 
TACGAATTCG 
TcgTCATATT 
TTAACATTTT 
ataccgccgc 
TTTTCGTTGG 
atcggcgGCG 
GTTTCTGTGC 
GCCTGAACGC 
TTTGAACTCT 
CTGCCGCACC 
TCCGAATGCC 
TATCAGCTTT 
TACGGATATT 



This encodes a protein having amino acid sequence <SEQ ID 798>: 



1 MSYRASSSPD 
51 TAFSAAMRLS 



FLEVETAPLI 
SSCWIFLSF 



101 LRLYAFHPPE 

151 NHGRIDIDHL 

201 EQRVGN GVQQ 

251 RHRLCS * 



IAEFFVGFAF 
PTLRLNALIR 
RVGIRMPEQP 



FLPLLPKASM 
GKPYQQTAAI 
DIDARNIDTQ 
RTQKDAAVRI 
FFKWDFNSAK 



KKLMVEPVPM 
LTFFCTSWPP 
IGGDVGTHLR 
FELCGGVGKM 
YQLSAFGQLV 



PMYSFSGTNS 
RSNPYQQYRR 
NVRCEFGFLC 
AADVAQTCRT 
DIVALSDTDI 
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ORF122ng and ORF122-1 show 92.6% identity in 256 aa overlap: 

10 20 30 40 50 60 

orf 122-1. pep ISYWASSSPDFLEVDTAPLIFLPLLPKASMKKLMVEPVPMPIYSFSGTNSTAFSAAMRLS 
:|| I I M I I I I I I : I I I I I I I I I I II I I I I I I II I I I I I I : I I I I I I I I I I I I I I I I I I 
orfl22ng MSYRASSSPDFLEVETAPLIFLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLS 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 122-1 . pep SSCWIFLSFGKPYQQTAAILTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAF 
I I I I I II I I I II I I I I I I I I I I I I I I I Mill I I I I I I II II I I I I I I I I I I I I I I I I 
orfl22ng SSCWIFLSFGKPYQQTAAILTFFCTSWPPRSNPYQQYRRLRLYAFHPPEIAEFFVGFAF 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 122-1. pep DVDARNVYAQIGGDVGTHLRNVRREFGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRI 
1:1111: : I I I II I I I I I I I I I I I I I I I I I I I I I I I : E I I I I I I ! I I I I I I I I I I I I I 
orfl22ng DIDARNIDTQIGGDVGTHLRNVRCEFGFLCNHGRIDIDHLPTLRLNALIRRTQKDAAVRI 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 122-1 . pep FELCGGVGEMAADIAQTCRTEQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLV 
I II I I I I I : I I II : I I I I I I II I I I I I I I I I : I I : II I I II I I I I II I I I I I I I I I I I 
orfl22ng FELCGGVGKMAADVAQTCRTEQRVGNGVQQRVGIRMPEQPFFKWDFNSAKYQLSAFGQLV 

190 200 210 220 230 240 

250 

orf 122-1 . pep D I V ALS DT DVRHRLC S X 
I I I I II I I I : I I I I I I I 
orfl22ng DIVALSDTDIRHRLCSX 

250 

Based on this analysis, it is predicted that the proteins from Kmeningitidis and K gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 95 



The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 799>: 



1 . . GCCGGCGCGA GTGCGAACAA CATTTCCGCG CGTTTTGCGG AAACACCCGT 

51 CGCTGTCAGC GTTACCCTGA TCGGCACGGT ACTTGCCGTC ATGCTGCCCG 

101 TTACCGAATA TGAAAACTTC CTGCTGCTTA TCGGCTCGGT ATTTGCGCCG 

151 ATGGGGCGGA XTTTGATTGC CGACTTTTTC GTCTTGAAAC GGCGTGA 

This corresponds to the amino acid sequence <SEQ ID 800; ORF125>: 

1 . .AGASANNISA RFAETPVAVS VTLIGTVLAV MLPVTEYENF LLLIGSVFAP 

51 MGGFDCRLFR LETA* 



Further work revealed the complete nucleotide sequence <SEQ ID 80 1>: 

1 ATGTCGGGCA ATGCCTCCTC TCCTTCATCT TCCTCCGCCA TCGGGCTGAT 

51 TTGGTTCGGC GCGGCGGTAT CGATTGCCGA AATCAGCACG GGTACGCTGC 

101 TTGCGCCTTT GGGCTGGCAG CGCGGTCTGG CGGCTCTACT TTTGGGTCAT 

151 GCCGTCGGCG GCGCGCTGTT TTTTGCGGCG GCGTATATCG GCGCACTGAC 

201 CGGACGCAGC TCGATGGAAA GCGTGCGCCT GTCGTTCGGC AAACGCGGTT 

251 CAGTGCTGTT TTCCGTGGCG AATATGCTGC AACTGGCCGG CTGGACGGCG 

301 GTGATGATTT ACGCCGGCGC AACGGTCAGC TCCGCTTTGG GCAAAGTGTT 

351 GTGGGACGGC GAATCTTTTG TCTGGTGGGC ATTGGCAAAC GGCGCGCTGA 

401 TTGTGCTGTG GCTGGTTTTC GGCGCACGCA AAACAGGCGG GCTGAAAACC 

451 GTTTCGATGC TGCTGATGCT GTTGGCGGTT CTGTGGCTGA GTGCCGAAGT 

501 CTTTTCCACG GCAGGCAGCA CCGCCGCACA GGTTTCAGAC GGCATGAGTT 

551 TCGGAACGGC AGTCGAGCTG TCCGCCGTGA TGCCGCTTTC CTGGCTGCCG 

601 CTTGCCGCCG ACTACACGCG CCACGCGCGC CGCCCGTTTG CGGCAACCCT 

651 GACGGCAACG CTCGCCTACA CGCTGACCGG CTGCTGGATG TATGCCTTGG 

701 GTTTGGCAGC GGCGTTGTTC ACCGGAGAAA CCGACGTGGC AAAAATCCTG 

751 CTGGGCGCAG GTTTGGGTGC GGCAGGCATT TTGGCGGTCG TCCTCTCCAC 
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801 CGTTACCACA ACGTTTCTCG ATGCCTATTC CGCCGGCGCG AGTGCGAACA 

851 ACATTTCCGC GCGTTTTGCG GAAACACCCG TCGCTGTCGG CGTTACCCTG 

901 ATCGGCACGG TACTTGCCGT CATGCTGCCC GTTACCGAAT ATGAAAACTT 

951 CCTGCTGCTT ATCGGCTCGG TATTTGCGCC GATGGCGGCG GTTTTGATTG 

1001 CCGACTTTTT CGTCTTGAAA CGGCGTGAGG AGATTGAAGG CTTTGACTTT 

1051 GCCGGACTGG TTCTGTGGCT TGCGGGCTTC ATCCTCTACC GCTTCCTGCT 

1101 CTCGTCCGGC TGGGAAAGCA GCATCGGTCT GACCGCCCCC GTAATGTCTG 

1151 CCGTTGCCAT TGCCACCGTA TCGGTACGCC TTTTCTTTAA AAAAACCCAA 

1201 TCTTTACAAA GGAACCCGTC ATGA 

This corresponds to the amino acid sequence <SEQ ID 802; ORF125-l>: 



1 MSGNASSPSS SSAIGLIWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 

51 AVGGALFFAA AYIGALTGRS SMESVRLSFG KRGSVLFSVA NMLQLAGWTA 

101 VMIYAGATVS SALGKVLWDG ES FVWWALAN GALIVLWLV F GARKTGGLKT 

151 VS MLLMLLAV LWLSAEVF ST AGSTAAQVSD GMSFGTAVEL SAVMPLSWLP 

201 LAADYTRHAR RPFAATLTAT LAYTLTGCWM YALGLAAALF TGETDVAKIL 

251 LGAGLGAAGI LAWL STVTT TFLDAYSAGA SANNISARFA E TPVAVGVTL 

301 IGTVLAVMLP VTEYEN FLLL IGSVFAPMAA VLIA DFFVLK RREEIEGFDF 

351 AGLVLWLAGF ILYRFLL SSG WESSIGLT AP VMSAVAIATV SVRLFF KKTQ 

401 SLQRNPS* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF125 shows 76.5% identity over a 51aa overlap with an ORF (ORF125a) from strain A of N. 
meningitidis: 



10 20 30 

orf 125 .pep AG AS ANN I S AR FAE T P VAVS VT L I GT VLAV 

I I : I I I I I I I : : : I I : I I : I : : : I I : I I I 
orf 125a KI L LGAGLGAAG I LAWL ST VTTT FLDAY S AGVS ANN I S AKL S E I P I AVAVAWGT LLAV 

250 260 270 280 290 300 



40 50 60 

orf 125 . pep ML P VTE YEN FLLL I G S V FAPMGG FDCRL FRLET AX 

:MM I I I I M Mill II I II: 
orf 125a LLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEG 

310 320 330 340 

The ORF125a partial nucleotide sequence <SEQ ID 803> is: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 



ATGTCGGGCA 
TTGGTTCGGC 
TTGCGCCTTT 
GCCGTCGGCG 
CGGACNCANC 
CAGTGCTGTT 
GTGATGATTT 
GTGGGACGGC 
TTGTGCTGTG 
GTTTCGATGC 
NTTTTCCACG 
TCGGAACGGC 
CTGGCCGCCG 
GACGGCAACG 
GTTTGGCAGC 
CTGGGCGCAG 
CGTTACCACC 
ATATTTCCGC 
GTCGGCACAC 
CCTGCTGCTT 
CCGACTTTTT 



ATGCCTCCTC 
GCGGCGGTAT 
GGGCTGGCAG 
GCGCGCTGTT 
TCGATGGAAA 
TTCCGTGGCG 
ACGCCGGCGC 
GAATCTTTTG 
GCTGGTTTTC 
TGCTGATGCT 
GCAGGCAGCA 
AGTCGAGCTG 
ACTACACGCG 
CTCGCCTACA 
GGCGTTGTTC 
GTTTGGGTGC 
ACTTTTCTCG 
CAAACTTTCG 
TGCTTGCCGT 
ATCGGCTCGG 
CGTCTTGAAA 



TCNTTCATCT 
CGATTGCCGA 
CGCGGTCTGG 
TTTTGCGGCG 
GCGTGCGCCf 
AATATGCTGC 
AACGGTCAGC 
TCTGGTGGGC 
GGCGCACGCA 
GTTGGCGGTT 
CCGCCGCANN 
TCCGCCGTNA 
CCACGCGCGC 
CGCTGACCGG 
ACCGGAGAAA 
GGCAGGCATT 
ATGCNTACTC 
GAAATACCNA 
CCTCCTGCCC 
TATTTGCGCC 
CGGCGTGAGG 



TCCGCCGCCA 
AATCAGCACG 
CNGCTCTGCT 
GCGTATATCG 
GTCGTTCGGC 
AACTGGCCGG 
TCCGCTTTGG 
ATTGGCAAAC 
AAACAGGCGG 
CTGTGGCTGA 
GGTNNCAGAC 
TGCCGCTTTC 
CGCCCGTTTG 
CTGCTGGATG 
CCGACGTGGC 
TTGGCGGTCG 
CGCCGGCGTA 
TCGCCGTTGC 
GTTACCGAAT 
GATGGCGGCG 
AGATTGAAGG 



TCGGGCTGAT 
GGTACACTGC 
TTTGGGTCAT 
GCGCACTGAC 
AAACGCGGTT 
CTGGACGGCG 
GCAAAGTGTT 
GGCGCGCTGA 
GCTGAAAACC 
GTGCCGAANT 
GGCATGAGTT 
TTGGCTGCCG 
CGGCAACCCT 
TATGCCTTGG 
AAAAATCCTG 
TCCTGTCGAC 
AGTGCCAACA 
CGTCGCCGTT 
ATGAAAACTT 
GTTTTGATTG 
C. . 



This encodes a protein having the partial amino acid sequence <SEQ ID 804>: 



1 MSGNASSXSS SAAIGLIWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 



51 AVGG ALFFAA AYIGALTGXX SMESVRLSFG KRGSVLFSVA NMLQLAGWTA 
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101 VMIYAGATVS SALGKVLWDG ES FVWWALAN GALIVLWLV F GARKTGGLKT 

151 VS MLLMLLAV LWLSAEXF ST AGSTAAXVXD GMSFGTAVEL SAVMPLSWLP 

201 LAADYTRHAR RPFAATLTAT LAYTLTGCWM YALGLAAALF TGETDVAKIL 

251 LGAGLGAAGI LAWL STVTT TFLDAYSAGV SANNISAKLS E IPIAVAVAV 

301 VGTLLAVLLP VTEYEN FLLL IGSVFAPMAA VLI ADFFVLK RREEIEG. . 

ORF125a and ORF125-1 show 94.5% identity in 347 aa overlap: 



10 20 30 40 50 60 

or f 125a . pep MSGNASSXSSSAAIGLIWFGAAVSIAEISTGTLLAPLGWQRGLAALLLGHAVGGALFFAA 
lilllll I I I : II I M I I I I I I I I I I M I I I I II I I I II I I I I II I I I I I I I I I I I I M 
orf 125-1 MSGNASSPSSSSAIGLIWFGAAVSIAEISTGTLLAPLGWQRGLAALLLGHAVGGALFFAA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 125a. pep AYIGALTGXXSMESVRLSFGKRGSVLFSVANMLQLAGWTAVMIYAGATVSSALGKVLWDG 
I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I 
orf 125-1 AYIGALTGRSSMESVRLSFGKRGSVLFSVANMLQLAGWTAVMIYAGATVSSALGKVLWDG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 125a . pep ESFVWWALANGALIVLWLVFGARKTGGLKTVSMLLMLLAVLWLSAEXFSTAGSTAAXVXD 
I I II I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 1 I 
orf 125-1 ESFVWWALANGALIVLWLVFGARKTGGLKTVSMLLMLLAVLWLSAEVFSTAGSTAAQVSD 

130 140 150 160 170 180 

190 200 210 220 230 240 

or f 125a . pep GMSFGTAVELSAVMPLSWLPLAADYTRHARRPFAATLTATIJVYTLTGCWMYALGLAAALF 
I I I I I I I I I I I I I I I I I II I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I 
orf 12 5-1 GMSFGTAVELSAVMPLSWLPLAADYTRHARRPFAATLTATLAYTLTGCWMYALGLAAALF 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 125a. pep TGETDV7UCILLGAGLGAAGILAWLSTVTTTFLDAYSAGVSANNISAKLSEIPIAVAVAV 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II : I I I I I I I : : : I I : I I : I : : 
orf 125-1 TGET DVAKI LLGAGLGAAGI LAWLSTVTTT FLDAYSAGASANNI SARFAET PVAVGVTL 

250 260 270 280 290 300 

310 320 330 340 

orf 125a. pep VGTLLAVLLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEG 
: I I : I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 125-1 IGTVLAVMLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEGFDFAGLVLWLAGF 

310 320 330 340 350 360 

Homology with a predicted ORF from ^gonorrhoeae 

ORF125 shows 86.2% identity over a 65aa overlap with a predicted ORF (ORF125ng) from 
N. gonorrhoeae: 



orf 125. pep 
orfl25ng 
orf 125. pep 
orfl25ng 



AGASANN I SARFAET PVAVSVTLIGTVLAV 
I I II I I II I I I II I lllhllll I I I I I 
KILLGAGLGITGILAWLSTVTTTFLDTYSAGASANNISARFAEIPVAVGVTLIRTVLAV 

MLPVTEYENFLLLIGSVFAPM-GGFDCRLFRLETA 64 
llllllhllllll 111:11 II MM II 1:11 
MLPVTEYKNFLLLIRSVFGPMAGGFDCRLFCLKTA 343 



30 



308 



An ORF125ng nucleotide sequence <SEQ ID 805> was predicted to encode a protein having amino 
acid sequence <SEQ ID 806>: 



l 

51 
101 
151 
201 
251 
301 



MSGNASSPSS SAAIGLVWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 
AVGGA LFFAA AYIGALTGRS SMESVRLSFG KCGSVLFSVA NMLQLAGWTA 
VMIYVGATVS SALGKVLWDG ES FVWWALAN GALIVLWLV F GARRTGGLKT 
VS MLLMLLAV LWLSVEVFA S SGTNAAPAVS DGMTFGTAVE LSAVMPLSWL 
PLAADYTRQA RRPFAATLTA TLAYTLTGCW MYALGLAAAL FTGETDVAKI 
LLGAGLGITG ILAWL STVT TTFLDTYSAG ASANNISARF AEIPVAVGVT 
LIRTVLAVML PVTEYKNFLL LIRSVFGPMA GGFDCRLFCL KTA* 
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Further work revealed the following gonococcal DNA sequence <SEQ ID 807>: 

1 ATGTCGGGCA ATGCCTCCTC TCCTTCATCT TCCGCCGCCA TCGGGCTGGT 

51 TTGGTTCGGC GCGGCGGTAT CGATTGCCGA AATCAGCACG GGTACGCTGC 

101 TCGCCCCCTT GGGCTGGCAG CGCGGTCTGG CGGCCCTGCT TTTGGGTCAT 

151 GCCGTCGGCG GCGCGCTGTT TTTTGCGGCG GCGTATATCG GCGCACTGAC 

201 CGGACGCAGC TCGATGGAAA GTGTGCGCCT GTCGTTCGGC AAATGCGGTT 

251 CAGTGCTGTT TTCCGTGGCG AATATGCTGC AACTGGCCGG CTGGACGGCG 

301 GTGATGATTT ACGTCGGCGC AACGGTCAGC TCCGCTTTGG GCAAAGTGTT 

351 GTGGGACGGC GAATCCTTTG TCTGGTGGGC ATTGGCAAAC GGCGCACTGA 

401 TCGTGCTGTG GCTGGTTTTC GGCGCACGCA GAACGGGCGG GCTGAAAACC 

4 51 GTTTCGATGC TGCTGATGCT GCTTGCCGTG TTGTGGTTGA GCGTCGAAGT 

501 GTTCGCTTCG TCCGGCACAA ACGCCGCGCC CGCCGTTTCA GACGGCATGA 

551 CCTTCGGAAC GGCAGTCGAA CTGTCCGCCG TCATGCCGCT TTCCTGGCTG 

601 CCGCTGGCCG CCGACTACAC GCGCCAAGCA CGCCGCCCGT TTGCGGCAAC 

651 CCTGACGGCA ACGCTCGCCT ATACGCTGAC GGGCTGCTGG ATGTATGCCT 

701 TGGGTTTGGC GGCGGCTCTG TTTACCGGAG AAACCGACGT GGCGAAAATC 

7 51 CTGTTGGGCG CGGGCTTGGG CATAACGGGC ATTCTGGCAG TCGTCCTCTC 

801 CACCGTTACC ACAACGTTTC TCGATACCTA TTCCGCCGGC GCGAGTGCGA 

851 ACAACATTTC CGCGCGTTTT GCGGAAATAC CCGTCGCTGT CGGCGTTACC 

901 CTGATCGGCA CGGTGCTTGC CGTCATGCTG CCCGTTACCG AATATAAAAA 

951 CTTCCTGCTG CTTATCGGCT CGGTATTTGC GCCGATGGCG GCGGTTTTGA 

1001 TTGCCGACTT TTTCGTCTTA AAACGGCGTG AGGAGATTGA AGGCTTTGAC 

1051 TTTGCCGGAC TGGTTCTGTG GCTGGCAGGC TTCATCCTCT ACCGCTTCCT 

1101 GCTCTCGTCC GGTTGGGAAA GCAGCATCGG TCTGACCGCC CCCGTAATGT 

1151 CTGCCGTTGC CATTGCCACC GTATCGGTAC GCCTTTTCTT TAAAAAAACC 

1201 CAATCTTTAC AAAGGAACCC GTCATGA 

This corresponds to the amino acid sequence <SEQ ID 808; ORF125ng-l>: 

1 MSGNASSPSS SAAIGLVWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 

51 AVGGALFFAA AYIGALTGRS SMESVRLSFG KCGSVLFSVA NMLQLAGWTA 

101 VMIYVGATVS SALGKVLWDG ES FVWWALAN GALIVLWLV F GARRTGGLKT 

151 VS MLLMLLAV LWLSVEVFA S SGTNAAPAVS DGMTFGTAVE LSAVMPLSWL 

201 PLAADYTRQA RRPFAATLTA TLAYTLTGCW MYALGLAAAL FTGETDVAKI 

251 LLGAGLGITG ILAWL STVT TTFLDTYSAG ASANNISARF AEIPVAVGVT 

301 LIGTVLAVM L PVTEYKN FLL LIGSVFAPMA AVLI ADFFVL KRREEIEGFD 

351 FAGLVLWLAG FILYRFLL SS GWESSIGLTA PVMSAVAIAT VSVRLFF KKT 

401 QSLQRNPS* 

ORF125ng-l and ORF125-1 show 95.1% identity in 408 aa overlap: 

10 20 30 40 50 60 

or f 12 5-1. pep MSGNAS SPSS S SAIGLIWFGAAVS I AEI STGTLLAPLGWQRGLAALLLGHAVGGALFFAA 
I I I I I I I I I I I : I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I 
orfl25ng-l MSGNAS SPSS SAAIGLVWFGAAVS I AEI STGTLLAPLGWQRGLAALLLGHAVGGALFFAA 

10 20 30 fO 50 60 

70 80 90 100 110 120 

orf 125-1. pep AYIGALTGRS SMES VRLSFGKRGSVLFSVANMLQLAGWTAVMIYAG AT VS SALGKVLWDG 
I I I I I I I I 1 I I I 1 I I I I I I I I I I II I I I I I I I I I I I I I I I I I I : I I I II I I I I I I I I I I 
orfl25ng-l AY I GALTGRSSMESVRLSFGKCGSVLFSVANMLQLAGWTAVM I YVGATVS SALGKVLWDG 

70 80 90 100 110 120 

130 140 150 160 170 179 

orf 125-1. pep ESFVWWALAN GALIVLWLVFGARKTGGLKTVSMLLMLLAVLWLSAEVFSTAGSTAAQ-VS 
I I I I I I I I M I I I I I I I I I I I I I : I I I I I I I I I i I I I I I 11 I N :i II ::: I :: I ! II 
orfl25ng-l ESFVWWALAN GALIVLWLVFGARRTGGLKTVSMLLMLLAVLWLSVEVFASSGTNAAPAVS 

130 140 150 160 170 180 

180 190 200 210 220 230 239 

orf 125-1 . pep DGMSFGTAVELSAVMPLSWLPLAADYTRHARRPFAATLTATLAYTLTGCWMYALGLAAAL 
I M: II III I III II Ml III II II Ml:! IMIMM I II I I I! I I I I Mil IIIMI I 
orfl25ng-l DGMTFGTAVELSAVMPLSWLPLAADYTRQARRPFAATLTATLAYTLTGCWMYALGLAAAL 

190 200 210 220 230 240 

240 250 260 270 280 290 299 

orf 125-1 . pep FTGETDVAKI LLGAGLGAAG I LAWL ST VTTT FLDAY SAGAS ANN I SARFAET PVAVGVT 
I | II I I I I I I I I I I II I : I I I I I I I I II I I I I I I : I I M I I I I I I I I I I I I I I II I I I 
orfl25ng-l FTGETDVAKI LLGAGLGITGILAWLSTVTTTFLDTYSAGASANNI SARFAEI PVAVGVT 

250 260 270 280 290 300 
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300 310 320 330 340 350 359 

orf 125-1 . pep LIGTVLAVMLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEGFDFAGLVLWLAG 
I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl25ng-l LIGTVLAVMLPVTEYKNFLLLIGSVFAPMAAVLIADFFVLKRREEIEGFDFAGLVLWLAG 

310 320 330 340 350 360 



360 370 380 390 400 

orf 125-1 . pep FILYRFLLSSGWESSIGLTAPVMSAVAIATVSVRLFFKKTQSLQRNPSX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl25ng-l FILYRFLLSSGWESSIGLTAPVMSAVAIATVSVRLFFKKTQSLQRNPSX 

370 380 390 400 

Based on this analysis, including the presence of putative leader sequence and transmembrane 
domains in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 96 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 809>: 

1 ATGACCCGTA TCGCCATCCT CGGCGGCGGC CTCTCGGGAA GGCTGACCGC 

51 GTTGCAGCTT GCAGAACAAG GTTATCAGAT TGCACTTTTC GATAAAAGCT 

101 GCCGCCGGGG CGAACACGCC GCCGCCTATG TAGCCGCCGC CATGCTCGCG 

151 CCTGCAGCGG A . ACGGTCGA AGCCACGCCC GAAGTGGTCA GGCTGGGCAG 

201 GCAGAGCATC CCGCTTTGGC GCGGCATCCG ATGCCGTCTG AACACGCACA 

251 CGATGATGCA GGAAAACGGC AGCCTGATTG TATGGCACGG GCAGGACAAG 

301 CCATTATCCA GCGAGTTCGT CCGCCATCTC AAACGCGGCG GCGT . ACGGA 

351 TGACGAAATC GTCCGTTGGC GCGCCGACGA CATCGCCGAA CGCGAACCGC 

401 AACTCGGCGG ACGTTTTTAA GACGGCATCT ACCTGCCGAC CGAAGC.CAG 

451 CTCGACGGGC GGCAATTATA GTCTGCACTT GCCGACGCTT TGGACGAACT 

501 GAACGTCCCC TGCCATTGGG AACACGAATG CGTCCCCGAA GCCTGCAAG. . 

This corresponds to the amino acid sequence <SEQ ID 810; ORF126>: 



1 MTRIAILGGG LSGRLTALQL AEQGYQIALF DKSCRRGEHA AAYVAAAMLA 

51 PAAXTVEATP EWRLGRQSI PLWRGIRCRL NTHTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGXTDDEI VRWRADDIAE REPQLGGRFX DGIYLPTEXQ 

151 LDGRQLXSAL ADALDELNVP CHWEHECVPE ACK. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 81 1>: 



1 


ATGACCCGTA 


TCGCCATCCT 


51 


GTTGCAGCTT 


GCAGAACAAG 


101 


GCCGCCGGGG 


CGAACACGCC 


151 


CCTGCGGCGG 


AAGCGGTCGA 


201 


GCAGAGCATC 


CCGCTTTGGC 


251 


CGATGATGCA 


GGAAAACGGC 


301 


CCATTATCCA 


GCGAGTTCGT 


351 


TGACGAAATC 


GTCCGTTGGC 


401 


AACTCGGCGG 


ACGTTTTTCA 


451 


CTCGACGGGC 


GGCAAATATT 


501 


GAACGTCCCC 


TGCCATTGGG 


551 


CCCAATACGA 


CTGGCTGATC 


601 


TGGAACCAAT 


CCCCCGAGCA 


651 


AGTGGCGCGG 


GTTTACACAC 


701 


TGCTCCATCC 


GCGTTATCCG 


751 


TTCGTCATCG 


GCGCGACCCA 


801 


CGTGCGTTCA 


GGGTTGGAAC 


851 


CCTTCGGCGA 


AGCCGACATC 


901 


CTCAACCACC 


ACAACCCCGA 


951 


TGAAATCAAC 


GGCCTTTTCC 


1001 


CCGCCGCCGC 


CGCCAGATTG 


1051 


CCCGAACGCG 


ATAAAGAAAG 


1101 


A 





CGGCGGCGGC CTCTCGGGAA GGCTGACCGC 
GTTATCAGAT TGCACTTTTC GATAAAGGCT 
GCCGCCTATG TTGCCGCCGC CATGCTCGCG 
AGCCACGCCC GAAGTGGTCA GGCTGGGCAG 
GCGGCATCCG ATGCCGTCTG AACACGCACA 
AGCCTGATTG TGTGGCACGG GCAGGACAAG 
CCGCCATCTC AAACGCGGCG GCGTAGCGGA 
GCGCCGACGA CATCGCCGAA CGCGAACCGC 
GACGGCATCT ACCTGCCGAC CGAAGGCCAG 
GTCTGCACTT GCCGACGCTT TGGACGAACT 
AACACGAATG CGTCCCCGAA GGCCTGCAAG 
GACTGCCGCG GCTACGGCGC AAAAACCGCG 
CACCAGCACC CTGCGCGGCA TACGCGGCGA 
CCGAAATCAC GCTCAACCGC CCCGTGCGTC 
CTCTACATCG CCCCGAAAGA AAACCACGTC 
AATCGAAAGC GAAAGCCAAG CCCCCGCCAG 
TCTTGTCCGC ACTCTATGCC ATCCACCCCG 
CTCGAAATCG CCACCGGCCT GCGCCCCACG 
AATCCGTTAC AACCGCGCCC GACGCCTGAT 
GCCACGGTTT CATGATCTCC CCCGCCGTAA 
GCAGTGGCAC TGTTTGACGG AAAAGACGCG 
CGGTTTGGCG TATATCCGAA GACAAGATTA 
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This corresponds to the amino acid sequence <SEQ ID 812; ORF126-l>: 

1 MTRIAILGGG LSGRLTALQL AEQGYQIALF DKGCRRGEHA AAYVAAAMLA 

51 PAAEAVEATP EWRLGRQSI PLWRGIRCRL NTHTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGVADDEI VRWRADDIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECVPE GLQAQYDWLI DCRGYGAKTA 

201 WNQSPEHTST LRGIRGEVAR VYTPEITLNR PVRLLHPRYP LYIAPKENHV 

251 FVIGATQIES ESQAPASVRS GLELLSALYA IHPAFGEADI LEIATGLRPT 

301 LNHHNPEIRY NRARRLIEIN GLFRHGF MIS PAVTAAAARL AVALF DGKDA 

351 PERDKESGLA YIRRQD* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meninpitidis (strain A) 

ORF126 shows 90.0% identity over a 180aa overlap with an ORF (ORF126a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

or f 12 6 . pep MTRIAILGGGLSGRLTALQLAEQGYQIALFDKSCRRGEHAAAYVAAAMLAPAAXTVEATP 
I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I : I I I I I 1 I I I I I I I I I I I I I I : I 1 I I I 
orfl26a MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 126. pep E WRLGRQS I PLWRG I RCRLNTHTMMQENGS LI VWHGQDKPLS S E FVRHLKRGGXT DDE I 
I I I I I I I I 111111111:1:1 : 1 I I I I I I I I I I I I I I I I I : I I I I 1 I I I I I : I I I 
orfl26a EWRLGRQXIPLWRGIRCHLKTPAMMXENGSLIVWHGQDKPLSNEFVRHLKRGGVADDXI 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 126 . pep VRWRADDIAE RE PQLGGRFXDG I YLPTEXQLDGRQLXSALADALDELNVPCHWEHECVPE 

ItlllllllMIMIIill I I I I I I I I MINI: I I I I I I I I I I I I 1 I I I I I M : I I 
orf 126a VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPE 

130 / 140 150 160 170 180 

The complete length ORF126a nucleotide sequence <SEQ ID 813> is: 

1 ATGACCCGTA TCGCCATCCT CGGCGGCGGC CTCTCNGGAA GGCTGACCGC 

51 ACTGCAGCTT GCAGAACAAG GTTATCAGAT TGCACTTTTC GATAAAGGCT 

101 GCCGCCGGGG CGAACACGCC GCCGCCTATG TTGCCGCCGC CATGCTCGCG 

151 CCTGCGGCGG AAGCGGTCGA AGCCACGCCT GAAGTGGTCA GGCTGGGCAG 

201 GCAGANCATC CCGCTTTGGC GCGGCATCCG ATGCCATCTG AAAACGCCTG 

251 CCATGATGCA NGAAAACGGC AGCCTGATTG TGTGGCACGG GCAGGACAAA 

301 CCTTTATCCA ACGAGTTCGT CCGCCATCTC AAACGCGGCG GCGTAGCGGA 

351 TGACNAAATC GTCCGTTGGC GCGCCGACGA CATCGCCGAA CGCGAACCGC 

401 AACTCGGCGG ACGTTTTTCA GACGGCATCT ACCTGCCGAC CGAAGGCCAG 

451 CTCGACGGGC GGCAAATATT GTCTGCACTT GCCGACGCTT TGGACGAACT 

501 GAACGTCCCC TGCCATTGGG AACACGAATG TGCCCCCGAA GACTTGCAAG 

551 CCCAATACGA CTGGCTGATC GACTGCCGCG GCTACGGCGC AAAAACCGCG 

601 TGGAACCAAT CCCCCGANNA NACCAGCACC CTGCGCGGCA TACGCGGCGA 

651 AGTGGCGCGG GTTTACACAC CCGAAATCAC GCTCAACCGC CCCGTGCGCC 

701 TGCTACACCC GCGCTATCCG CTNTACATCG CCCCGAAAGA AAACCNCGTC 

751 TTCGTCATCG GCGCGACCCA AATCGAAAGC GAAAGCCAAG CACCTGCCAG 

801 CGTGCGTTCC GGGCTGGAAC TCTTATCCGC ACTCTATGCC GTCCACCCCG 

851 CCTTCGGCGA AGCCGACATC CTCGAAATCG CCACCGGCCT GCGCCCCACG 

901 CTCAATCACC ACAACCCCGA AATCCGTTAC AACCGCGCCC GACGCCTGAT 

951 TGAAATCAAC GGCCTTTTCC GCCACGGTTT CATGATCTCC CCCGCCGTAA 

1001 CCGCCGCCGC CGTCAGATTG GCAGTGGCAC TGTTTGACGG AAAAGANGCG 

1051 CCCGAACGCG ATGAAGAAAG CGGTTTGGCG TATATCCGAA GACAAGATTA 

1101 A 

This encodes a protein having amino acid sequence <SEQ ID 814>: 

1 MTRIAILGGG LSGRLTALQL AEQGYQIALF DKGCRRGEHA AAYVAAAMLA 

51 PAAEAVEATP EWRLGRQXI PLWRGIRCHL KTPAMMXENG SLIVWHGQDK 

101 PLSNEFVRHL KRGGVADDXI VRWRADDIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECAPE DLQAQYDWLI DCRGYGAKTA 

201 WNQSPXXTST LRGIRGEVAR VYTPEITLNR PVRLLHPRYP LYIAPKENXV 
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251 FVIGATQIES ESQAPASVRS GLELLSALYA VHPAFGEADI LEIATGLRPT 
301 LNHHNPEIRY NRARRLIEIN GLFRHGFM IS PAVTAAAVRL AVALF DGKXA 
351 PERDEESGLA YIRRQD* 

ORF126a and ORF126-1 show 95.4% identity in 366 aa overlap: 



10 20 30 40 50 60 

or f 12 6a . pep MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 
I I I I I I I I I I I II I I I I I I I I I I I ! I I I I t I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I 
orf 126-1 MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 126a . pep EWRLGRQXIPLWRGIRCHLKTPAMMXENGSLIVWHGQDKPLSNEFVRHLKRGGVADDXI 
I I I I I I I I 111111111:1:1 : I I I I II I I I I I I I I I I I I : I I I I I I I I I I I I I I I 
orf 126-1 EWRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 

70 80 90 100 110 120 



130 140 150 160 170 180 

or f 12 6a . pep VRWRADDIAEREPQLGGRFSDGI YLPTEGQLDGRQILSALADALDELNVPCHWEHECAPE 
I I I I I I 1 I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I: I I 
orf 126-1 VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECVPE 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 126a. pep DLQAQYDWLIDCRGYGAKTAWNQSPXXTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYP 
I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I II I 
orf 126-1 GLQAQYDWLIDCRGYGAKTAWNQSPEHTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYP 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 126a . pep LYIAPKENXVFVIGATQIESESQAPASVRSGLELLSALYAVHPAFGEADILEIATGLRPT 
III I MM I M I I M I M I M II I II I M M M I I I I M : M I M I I I I I I I II M M I 
orf 126-1 LYIAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAIHPAFGEADILEIATGLRPT 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 126a. pep LNHHNPEIRYNRARRLIEINGLFRHGFMISPAVTAAAVRLAVALFDGKXAPERDEESGLA 
M I I I I I I I II I I I I I I I I I I I I I I I I II II I I I M I : I I I I I I I I I I I I I I I : I I I I I 
orf 126-1 LNHHNPEIRYNRARRLIEINGLFRHGFMISPAVTAAAARLAVALFDGKDAPERDKESGLA 

310 320 330 340 350 360 



orf 12 6a. pep YIRRQDX 
1 II I I I I 

orfl26-l YIRRQDX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF126 shows 90% identity over a 180 aa overlap with a predicted ORF (ORF126ng) from 
N. gonorrhoeae: 

orf 12 6 . pep MTRIAILGGGLSGRLTALQLAEQGYQIALFDKSCRRGEHAAAYVAAAMLAPAAXTVEATP 60 

II I II : II M I I II I II I I I M M M I Mil: I : I I II I I I I I I M I I I I I : II II I 
orfl26ng MTRIAVLGGGLSGRLTALQLAEQGYQIELFDKGTRQGEHAAAYVAAAMLAPAAEAVEATP 60 

orf 12 6 . pep EWRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGXTDDEI 120 

M : I I 1 I I I I I I I I I I II I I M I II I I II I I I I I I I M I I I I M I I I I I I I I I Mill 
orfl26ng EVIRLGRQSIPLWRGIRCRLNTLTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 120 

orf 12 6 . pep VRWRADDIAEREPQLGGRFXDGIYLPTEXQLDGRQLXSALADALDELNVPCHWEHECVPE 180 

I I I II I : M I I I I I M M I MMIIM Mill!: M I II II II I I M II I II I I : I : 
orfl26ng VRWRADEIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPQ 180 

An ORF126ng nucleotide sequence <SEQ ID 815> was predicted to encode a protein having amino 
acid sequence <SEQ ID 816>: 



1 MTRIAVLGGG LSGRLTALQL AEQGYQIELF DKGTRQGEHA AAYVAAAMLA 
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51 PAAEAVEATP EVIRLGRQSI PLWRGIRCRL NTLTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGVADDEI VRWRADEIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECAPQ DLQAQYDWVI DCRGYGAKTA 

201 WNQSPEHTST LRGIRGEVRG FTRPKSRSTA PCACCTRAIR STSPRKKTTS 

251 SSSARPKSKA KAKPPPAYVP GWNSYPRSMP STPPSAKPTS SKWRPGLRPT 

301 LNHHNPEIRY SRERRLIEIN GLFRHGFM IS PAVTAAAVRL AVALF DGKDA 

351 PERDEESGLA YIGRQD* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 817>: 

1 ATGACCCGTA TCGCCGTCCT CGGAGGCGGC CTTTCCGGAA GGCTGACCGC 

51 ATTGCAGCTT GCAGAACAAG GTTATCAGAT TGAACTTTTC GACAAGGGCA 

101 CCCGCCAAGG CGAACACGCC GCCGCCTATG TTGCCGCCGC GATGCTCGCG 

151 CCTGCGGCGG AAGCGGTCGA GGCAACGCCC GAAGTCATCA GGCTGGGCAG 

201 GCAGAGCATT CCGCTTTGGC GCGGCATCCG ATGCCGTCTG AACACGCTCA 

251 CGATGATGCA GGAAAACGGC AGCCTGATTG TGTGGCACGG GCAGGACAAG 

301 CCATTATCCA GCGAGTTCGT CCGCCATCTC AAACGCGGCG GCGTAGCGGA 

351 TGACGAAATC GTCCGTTGGC GCGCCGATGA AATCGCCGAA CGCGAACCGC 

401 AACTCGGCGG ACGTTTTTCA GACGGCATCT ACCTGCCGAC CGAAGGCCAG 

451 CTCGACGGGC GGCAAATATT GTCTGCACTT GCCGACGCTT TGGACGAACT 

501 GAACGTCCCT TGCCATTGGG AACACGAATG CGCCCCCCAA GACCTGCAAG 

551 CCCAATACGA CTGGGTAATC GACTGCCGGG GCTACGGCGC GAAAACCGCG 

601 TGGAACCAAT CCCCCGAGCA CACCAGCACC TTGCGCGGCA TACGCGGCGA 

651 AGTGGCGCGG GTTTACACGC CCGAAATCAC GCTCAACCGC CCCGTGCGCC 

701 TGCTGCACCC GCGCTATCCG CTCTACATCG CCCCGAAAGA AAACCACGTC 

751 TTCGTCATCG GCGCGACCCA AATCGAAAGC GAAAGCCAAG CCCCCGCCAG 

801 CGTACGTTCC GGGCTGGAAC TCTTATCCGC GCTCTATGCC GTCCACCCCG 

851 V CCTTCGGCGA AGCCGACATC CTCGAAATCG CCGCCGGCCT GCGCCCCACG 

901 CTCAACCACC ACAACCCCGA AATCCGCTAC AGCCGCGAAC GCCGCCTCAT 

951 CGAAATCAAC GGCCTTTTCC GGCACGGCTT TATGATTTCC CCCGCCGTAA 

1001 CCGCCGCCGC CGTCAGATTG GCAGTGGCAC TGTTTGACGG AAAAGACGCG 

1051 CCCGAACGTG ATGAAGAAAG CGGTTTGGCG TATATCGGAA GACAAGATTA 

1101 A 

This corresponds to the amino acid sequence <SEQ ID 818; ORF126ng-l>: 

1 MTRIAVLGGG LSGRLTALQL AEQGYQIELF DKGTRQGEHA AAYVAAAMLA 

51 PAAEAVEATP EVIRLGRQSI PLWRGIRCRL NTLTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGVADDEI VRWRADEIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECAPQ DLQAQYDWVI DCRGYGAKTA 

201 WNQSPEHTST LRGIRGEVAR VYTPEITLNR PVRLLHPRYP LYIAPKENHV 

251 FVIGATQIES ESQAPASVRS GLELLSALYA VHPAFGEADI LEIAAGLRPT 

301 LNHHNPEIRY SRERRLIEIN GLFRHGFM IS PAVTAAAVRL AVALF DGKDA 

351 PERDEESGLA YIGRQD* 

ORF126ng-l and ORF126-1 show 95.1% identity in 366 aa overlap: 

10 20 30 40 50 60 

MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 
I I II I : I I I I I I I I I I I M I II I I I M Mill I : I I I I I I I I I I I I II I I I I I I I I I I 
MTRIAVLGGGLSGRLTALQLAEQGYQIELFDKGTRQGEHAAAYVAAAMLAPAAEAVEATP 
10 20 30 40 50 60 

70 80 90 100 110 120 

EWRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 

I I : I I M I I I I I II I I I I I I I I I I I II I I I I I I I il I I 1 I I I I I I I I I I I I I I I I I I I I 
EVIRLGRQSIPLWRGIRCRLNTLTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 
70 80 90 100 110 120 

130 140 150 160 170 180 

VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECVPE 
I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I: I: 
VRWRADEIAE RE PQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPQ 
130 140 150 160 170 180 

190 200 210 220 230 240 

GLQAQYDWLIDCRGYGAKTAWNQSPEHTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYP 
III 1111:11 II II Ml I III Ml I III 1IIMI I I MM I M II I M IMIIII II II 
DLQAQYDWV I DCRGYGAKTAWNQSPEHTSTLRG I RGEVARVYTPE I TLNR PVRLLHPRYP 
190 200 210 220 230 240 



orfl26-l .pep 
orf 126ng-l 

orf 126-1 .pep 
orfl26ng-l 

orf 126-1. pep 
orf!26ng-l 

orf 126-1. pep 
orfl26ng-l 
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250 260 270 280 290 300 

or f 126-1 . pep LYIAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAIHPAFGEADILEIATGLRPT 

I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I : I I I I I I I I I I I I I : I I I I I 
orfl26ng-l LYIAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAVHPAFGEADILEIAAGLRPT 
fx 250 260 270 280 290 300 



310 320 330 340 350 360 

orf 126-1 . pep LNHHNPEIRYNRARRLIEINGLFRHGFMISPAVTAAAARLAVALFDGKDAPERDKESGLA 

lllilllllhl t I I I I I I I I I I I I M I I I I I I I I I : I I I I I i I I I I I M I I I : I I I I I 
orfl26ng-l LNHHNPEIRYSRERRLIEINGLFRHGFMISPAVTAAAVRLAVALFDGKDAPERDEESGLA 

310 320 330 340 350 360 



orfl26-l.pep YIRRQDX 
II I I I I 

orfl26ng-l YIGRQDX 

Furthermore, ORF126ng-l shows homology to a putative Rhizobium oxidase flavoprotein: 

gi 1 2627327 (AF004408) putative amino acid oxidase flavoprotein [Rhizobium etli] 
Length =327 
Score «= 169 bits (423), Expect - 3e-41 

Identities = 112/329 (34%), Positives = 163/329 (49%), Gaps = 25/329 (7%) 

Query: 3 RIAVLGGGLSGRLTALQLAEQGYQIELFDKGTRQGEHXXXXXXXXXXXXXXXXXXXXXXX 62 

RI V G G++G A QL G+++ L ++ G 
Sbjct: 2 RI LVNGAGVAGLTVAWQLYRHGFRVTLAERAGTVGA-GASG FAGGMLAPWCERE SAEE PV 60 

Query: 63 IRLGRQSIPLWRGIRCRLNTLTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEIVR 122 

+ LGR + W + G+L+V G+D F R G DE+ 

Sbjct: 61 LTLGRLAADWWEAA L PGHVHRRGT LWAGGRDTGE LDR FSRRT S - GWEWLDEVA- 113 

Query: 123 WRADEIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPQDL 182 

IA EP L GRF ++ E LD RQ L+ALA L++ + + 
Sbjct: 114 IAALEPDLAGRFRRALFFRQEAHLDPRQALAALAAGLEDARMRLTLG WGES 165 

Query: 183 QAQYDWVIDCRGYGAKTAWNQSPEHTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYPLY 242 

+D V+DC G LRG+RGE+ V T E++L+RPVRLLHPR+P+Y 

Sbjct: 166 DVDHDRWDCTGAA QIGRLPGLRGVRGEMLCVETTEVSLSRPVRLLHPRHPIY 218 

Query: 243 IAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAVHPAFGEADILEIAAGLRPTLN 302 

I P++ + F++GAT IES+ P + RS +ELL+A YA+HPAFGEA + E AG+RP 
Sbjct: 219 IVPRDKNRFMVGATMIESDDGGPITARSLMELLNAAYAMHPAFGEARVTETGAGVRPAYP 278 

Query: 303 HHNPEIRYSRERRLIEINGLFRHGFMISP 331 

+ P R ++E R + +NGL+RHGF+++P 
Sbjct: 27 9 DNLP — RVTQEGRTLHVNGLYRHGFLLAP 305 

This analysis suggests that the proteins from N.meningitidis and N gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 97 

The following DNA sequence, believed to be complete, was identified in N.meningitidis <SEQ ID 
819>: 

1 ATGACTGATA ATCGGGGGTT TACGCTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT GCAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGGTTTA AACAAACATC 

201 TACCAAGTGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

251 GTTTGAATGG AATCGtCGCG CGGG. .GCTT TAGACAGTAA ATTCATGTTG 

301 AAGGCGGTAG CCATAGATAA AGATAAAAAT CCTTTTATTA TTAAGATGAA 

351 TGAAAATCTA GTAACCTTTA aTTTGCAAGA AGTCCGCCAG TTCGTGTAGT 

401 GACGGGCTGG ATTATTTTAA AGGAAATGAT AAGGACTGCA AGTTACTTAA 

451 GTAG 
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This corresponds to the amino acid sequence <SEQ ID 820; ORF127>: 

1 MTDNRGFTLV ELISWLILS VLALIVYPSY RNYVEKAKIN AVRAALLENA 
51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIVA RXALDSKFML 
101 KAVAIDKDKN PFIIKMNENL VTFICKKSAS SCSDGLDYFK GNDKDCKLLK 
151 * 

Further work revealed the following DNA sequence <SEQ ID 82 1>: 



1 ATGACTGATA ATCGGGGGTT TACGCTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT GCAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGGTTTA AACAAACATC 

201 TACCAAGTGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

251 GTTTGAATGG AATCGCGCGC GGGGCTTTAG ACAGTAAATT CATGTTGAAG 

301 GCGGTAGCCA TAGATAAAGA TAAAAATCCT TTTATTATTA AGATGAATGA 

351 AAATCTAGTA ACCTTTATTT GCAAGAAGTC CGCCAGTTCG TGTAGTGACG 

401 GGCTGGATTA TTTTAAAGGA AATGATAAGG ACTGCAAGTT ACTTAAGTAG 

This corresponds to the amino acid sequence <SEQ ID 822; ORF127-l>: 



1 MTDNRGFTL V ELISWLILS VLALIV YPSY RNYVEKAKIN AVRAALLENA 
51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIAR GALDSKFMLK 
101 AVAIDKDKNP FIIKMNENLV TFICKKSASS CSDGLDYFKG NDKDCKLLK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from AT. meningitidis (strain A) 

ORF127 shows 98.0% identity over a 150aa overlap with an ORF (ORF127a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 127 . pep MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 
I I I I 1 1 I I I I I I I I I I 1 1 I I I 1 1 1 1 1 1 I I I I I I I I I I 1 1 I : I I I I I I I I I 1 1 I I I I I I I I 
orf 127a MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINTVRAALLENAHFMEKFYLQN 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 127. pep GRFKQTSTKWPSLPIKEAEGFCIRLNGIVARXALDSKFMLKAVAIDKDKNPFIIKMNENL 
I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 II I I I I I I I I I I I I I I I I I I I I II I ! I I I I 
orf 127a GRFKQTSTKWPSLPIKEAEGFCIRLNGI-ARGALDSKFMLKAVAIDKDKNPFIIKMNENL 

70 80 90 100 110 



130 140 150 

VT FI CKKS AS SCS DGLDYFKGNDKDCKLLKX 
I I I I I I I I I I I I i I I I I II I I I I I I I I I I 1 I 
VTFICKKSAS SCSDGLDYFKGNDKDCKLLKX 
120 130 140 150 

The complete length ORF 127a nucleotide sequence <SEQ ID 823> is: 

1 ATGACTGATA ATCGGGGGTT TACGCTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT ACAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGATTTA AACAAACATC 

201 TACCAAATGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

251 GTTTGAATGG AATCGCGCGC GGGGCCTTAG ACAGTAAATT CATGTTGAAG 

301 GCGGTAGCCA TAGATAAAGA TAAAAATCCT TTTATTATTA AGATGAATGA 

351 AAATCTAGTA ACCTTTATTT GCAAGAAGTC CGCCAGTTCG TGTAGTGACG 

401 GGCTGGATTA TTTTAAAGGA AATGATAAGG ACTGCAAGTT ACTTAAGTAG 

This encodes a protein having amino acid sequence <SEQ ID 824>: 



orf 127 .pep 
orfl27a 



1 MTDNRGFTL V ELISWLILS VLALIV YPSY RNYVEKAKIN TVRAALLENA 
51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIAR GALDSKFMLK 
101 AVAIDKDKNP FIIKMNENLV TFICKKSASS CSDGLDYFKG NDKDCKLLK* 



WO 99/24578 



-452- 
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ORF127a and ORF127-1 show 99.3% identity in 149 aa overlap: 



10 



15 



10 20 30 40 50 60 

orf 127a . pep MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINTVRAALLENAHFMEKFYLQN 
M I I I I M I I I I I I i I I I I I I I I I N I I M I I N I I I I I i : I I I I i I I I I I M I I I I I I i 
orf 127-1 MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 127a. pep GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 

I | | | | | | I | | I I II I I I I.I I I I I I I I I 1 II I I I I II M I I I I I I I I I I I I I I I I I I I I I I 
orf 127-1 GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 

70 80 90 100 110 120 

130 140 150 

orf 127a . pep T F I CKKS AS S CS DGL D Y FKGN DKDCKLLKX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 127-1 T FI CKKS AS S C S DGL DY FKGN DKDCKL LKX 

130 140 150 



20 Homology with a predicted ORF from gonorrhoeae 

ORF127 shows 97.3% identity over a 150 aa overlap with a predicted ORF (ORF127ng) from 



25 



30 



N. gonorrhoeae: 

orf 127. pep 
orfl27ng 
orf 127. pep 
orfl27ng 
orf 127. pep 
orf 127ng 

The complete length ORF127ng nucleotide sequence <SEQ ID 825> is: 



MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 60 
I Mil li I MM Mlllll MIII1M1MMII II lllll I II 1:1 I II I III Ml Ml 
MTDNRGFTLVELISVVLILSVLALIVYPSYRNYVEKAKINAVRAAFLENAHFMEKFYLQN 60 

GRFKQTSTKWPSLPIKEAEGFCIRLNGIVARXALDSKFMLKAVAIDKDKNPFIIKMNENL 120 

M I II M I I I I I I I I M I 11 M I I I I II I! I II I I I 11 I 1 II II I I I I I I II I I I I I I 
GRFKQTSTKWPSLPIKEAEGFCIRLNGI-ARGALDSKFMLKAVAIDKDKNPFIIKMNENL 119 

VTFICKKSASSCSDGLDYFKGNDKDCKLLK 150 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
VTFICKKSASSCSDRLDYFKGNDKDCKLLK 14 9 



35 



40 



i 

51 
101 
151 
201 
251 
301 
351 
401 



ATGACTGATA 
GATATTGTCT 
TTGAGAAAGC 
CATTTTATGG 
TACCAAATGG 
GTTTGAATGG 
GCGGTAGCCA 
AAATCTAGTA 
GGCTGGATTA 



ATCGGGGGTT 
GTACTTGCTT 
AAAGATAAAT 
AAAAGTTTTA 
CCAAGTTTGC 
AATCGCGCGC 
TAGATAAAGA 
ACCTTTATTT 
TTTTAAAGGA 



TACACTGGTT 
TAATTGTTTA. 
GCAGTGCGGG 
TCTGCAGAAT 
CGATTAAAGA 
GGGGCTTTAG 
TAAAAATCCT 
GCAAGAAGTC 
AATGATAAGG 



GAATTAATAT 
TCCGAGCTAT 
CAGCCTTGTT 
GGGAGATTTA 
GGCAGAAGGC 
ACAGTAAATT 
TTTATTATTA 
CGCCAGTTCG 
ACTGCAAGTT 



CAGTGGTCTT 
CGCAATTATG 
AGAAAATGCA 
AACAAACATC 
TTTTGTATCC 
CATGTTGAAG 
AGATGAATGA 
TGTAGTGACG 
ACTTAAGTAG 



45 



This encodes a protein having amino acid sequence <SEQ ID 826>: 

1 MTDNRGFTL V ELISWLILS VLALIV YPSY RNYVEKAKIN AVRAAFLENA 
51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIAR GALDSKFMLK 
101 AVAIDKDKNP FIIKMNENLV TFICKKSASS CSDRLDYFKG NDKDCKLLK* 

ORF127ng and ORF127-1 show 100.0% identity in 149 aa overlap: 



50 



55 



60 



orf 127-1. pep 
orfl27ng-l 



orfl27-l.pep 
orfl27ng-l 



10 20 30 40 50 60 

MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 
M I M Ml MM Mill MMMMMIMIIMM M I II III I Mill II IMMM I 
MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 

10 20 30 40 50 60 

70 80 90 100 110 120 

GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 

I II I I I I I I I I I I I I I I M I It I I M I M I 1 1 I I I I I I 1 M MM llllllllll 

GRFKQTSTKWPSLP IKEAEGFCIRLNG I ARGALDSKFMLKAVAIDKDKNP FIIKMNENLV 
70 80 90 100 110 120 



WO 99/24578 PCT/IB98/01665 

-453- 

130 140 150 

orf 127-1 .pep TFICKKSASSCSDGLDYFKGNDKDCKLLKX 
I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
orfl27ng-l TFICKKSASSCSDGLDYFKGNDKDCKLLKX 

130 140 150 

This analysis, including the fact that the predicted transmembrane domain is shared by the 
meningococcal and gonococcal proteins, suggests that the proteins from N.meningitidis and 
N.gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 98 

The following partial DNA sequence was identified in N.meningitidis <SEQ ED 827> 

1 . . GTGTCGCTGG CTTCGGTGAT TGCCTCTCAA ATCTTCCTTT ACGAAGATTT 
51 CAACCAAATG CGGAAAACCC GTGGAGCTAT CTGCGGTTTT CTTGTCCAAT 
101 ATTTATCTGG GGTTTCAGCA GGGGTATTTC GATTTGAGTG CCGACGAGAA 
151 CCCCGTACTG CATATCTGGT CTTTGGCAGT AGAGGAACAG TATTACCTCC 
201 TGTATCCCCT TTTGCTGATA TTTTGCTGCA AAAAAACCAA ATCGCTACGG 
251 GTGCTGCGTA ACATCAGCAT CATCCTGTTT TTGATTTTGA CTGCCTCATC 
301 GTTTTTGCCA AGCGGGTTTT ATACCGACAT CCTCAACCAA CCCAATACTT 
351 ATTACCTTTC GACACTGAGG TTTCCCGAGC TGTTGGCAGG TTCGCTGCTG 
401 GCGGTTTACG GGCAAACGCA AAACGGCAGA CGGCAAACAG CAAATGGAAA 
451 ACGGCAGTTG CTTTCATCAC TCTGCTTCGG CGCATTGCTT GCCTGCCTGT 
501 TCGTGATTGA CAAACACAAT CCGTTTATCC CGGGAATGAC CCTGCTCCTT 
551 CCCTGCCTGC TGACGGCACT GCTTATCCGG AGTATGCAAT ACGGGACACT 
601 TCCGACCCGC ATCCTGTCGG CAAGCCCCAT CGTATTTGTC GGCAAAATCT 
651 CTTATTCCCT ATACCTGTAC CATTGGATTT TTATTGCTTT CGCTCCGCTC 
701 ATTAGAGGCG GGAAACAGCT CGGACTGCCT GCCG. . 

This corresponds to the amino acid sequence <SEQ ID 828; ORF128>: 

1 . . VSIASVIASQ IFLYEDFNQM RKTVELSAVF LSNIYLGFQQ GYFDLSADEN 
51 PVLHIWSLAV EEQYYLLYPL LLIFCCKKTK SLRVLRNISI ILFLILTASS 
101 FLPSGFYTDI LNQPNTYYLS TLRFPELLAG SLLAVYGQTQ NGRRQTANGK 
151 RQLLSSLCFG ALLACLFVID KHNPFIPGMT LLLPCLLTAL LIRSMQYGTL 
201 PTRILSASPI VFVGKISYSL YLYHWIFIAF APLIRGGKQL GLPA. . 

Further work revealed the complete nucleotide sequence <SEQ ID 829>: 

1 ATGCAAGCTG TCCGATACAG ACCGGAAATT GACGGATTGC GGGCCGTCGC 

51 CGTGCTATCC GTCATGATTT TCCACCTGAA TAACCGCTGG CTGCCCGGAG 

101 GATTCCTGGG GGTGGACATT TTCTTTGTCA TCTCAGGATT CCTCATTACC 

151 GGCATCATTC TTTCTGAAAT ACAGAACGGT TCTTTTTCTT TCCGGGATTT 

201 TTATACCCGC AGGATTAAGC GGATTTATCC TGCCTTTATT GCGGCCGTGT. 

251 CGCTGGCTTC GGTGATTGCC TCTCAAATCT TCCTTTACGA AGATTTCAAC 

301 CAAATGCGGA AAACCGTGGA GCTTTCTGCG GTTTTCTTGT CCAATATTTA 

351 TCTGGGGTTT CAGCAGGGGT ATTTCGATTT GAGTGCCGAC GAGAACCCCG 

4 01 TACTGCATAT CTGGTCTTTG GCAGTAGAGG AACAGTATTA CCTCCTGTAT 

451 CCCCTTTTGC TGATATTTTG CTGCAAAAAA ACCAAATCGC TACGGGTGCT 

501 GCGTAACATC AGCATCATCC TGTTTTTGAT TTTGACTGCC TCATCGTTTT 

551 TGCCAAGCGG GTTTTATACC GACATCCTCA ACCAACCCAA TACTTATTAC 

601 CTTTCGACAC TGAGGTTTCC CGAGCTGTTG GCAGGTTCGC TGCTGGCGGT 

651 TTACGGGCAA ACGCAAAACG GCAGACGGCA AACAGCAAAT GGAAAACGGC 

701 AGTTGCTTTC ATCACTCTGC TTCGGCGCAT TGCTTGCCTG CCTGTTCGTG 

751 ATTGACAAAC ACAATCCGTT TATCCCGGGA ATGACCCTGC TCCTTCCCTG 

801 CCTGCTGACG GCACTGCTTA TCCGGAGTAT GCAATACGGG ACACTTCCGA 

851 CCCGCATCCT GTCGGCAAGC CCCATCGTAT TTGTCGGCAA AATCTCTTAT 

901 TCCCTATACC TGTACCATTG GATTTTTATT GCTTTCGCCC ATTACATTAC 

951 AGGCGACAAA CAGCTCGGAC TGCCTGCCGT ATCGGCGGTT GCCGCGTTGA 

1001 CGGCCGGATT TTCCCTGTTG AGTTATTATT TGATTGAACA GCCGCTTAGA 

1051 AAACGGAAGA TGACCTTCAA AAAGGCATTT TTCTGCCTCT ATCTCGCCCC 

1101 GTCCCTGATA CTTGTCGGTT ACAACCTGTA CGCAAGGGGG ATATTGAAAC 

1151 AGGAACACCT CCGCCCGTTG CCCGGCGCGC CCCTTGCTGC GGAAAATCAT 
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1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 



TTTCCGGAAA 
GGGGTTTCTG 
TGTCCCTCGA 
AACCCGTTAT 
TTTCATTGCC 
GATTTGAAGC 
GAAACCGTCA 
CAACACATCA 
TTGCCGCAAA 
AAGAGCAATC 
TTGGGTGGAC 
GCCGCTATCT 
TATATGGGGC 
CGGCGGCGCA 



CCGTCCTGAC 
GATTATGTCG 
TTCGGAGTGT 
GTCGAAAATA 
CAATTCTATG 
GCAATCCTTC 
AAAGGATAGC 
ATCAGCCGTT 
CCAATATCTC 
AGGCGGTCTT 
GCACAAAAAT 
TTACGGCGAC 
GGGAATTCCA 
TTGCAGTAG 



CCTCGGCGAC 
GCAGCCGGGA 
TTGGTTTGGG 
CCGGGATGAA 
ATTTGAGGAT 
CTAATACCCG 
CGCCGTCAAA 
CGCCCCTGAG 
CGCCCCATTC 
TGATTTGATT 
ACCTGCCCAA 
CAAGACCACC 
CAAACACGAA 



TCGCACGCCG 
AGGGTGGAAA 
TAGATGAGAA 
GTTGAAAAAG 
GGGCGGCCAG 
GGTTCCCAGC 
CCCGTCTATG 
GGAGGAAAAA 
AGGCTATGGG 
AAAGATATTC 
AAACACGGTC 
TGACCTATTT 
CGCCTGCTTA 



GACACCTGAG 
GCCAAAATCC 
GCTGGCAGAC 
CCGAAGCCGT 
CCTGTGCCGA 
CCGATTCAGG 
TTTTTGCAAA 
TTGAAAAGAT 
CGACATCGGC 
CCAATGTGCA 
GAAATATACG 
CGGTTCTTAT 
AATCTTCCCA 



This corresponds to the amino acid sequence <SEQ ID 830; ORF128-l>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



MQAVRYRPEI_ 
GIIL SEIQNG 
QMRKTVELSA 
PLLLIFCCKK 
LSTLRFPELL 
IDKHNPFIPG 
SLYLYHWIFI 
KRKMTFKKAF 
FPETVLTLGD 
NPLCRKYRDE 
ETVKRIAAVK 
KSNQAVFDLI 
YMGREFHKHE 



DGLRAVAVLS VMIFHL NNRW LPGGFLG VDI FFVISGFLIT 
SFSFRDFYTR RIKRIYP AFI hhVSLASVIh SQIFL YEDFN 
VFLSNIYLGF 
TKSLRVLRNI_ 
AGSLLAVYGQ 
MTLLLPCLLT 



RIKRIYPAFI 
QQGYFDLSAD 
SIILFLILTA 



ENPVLHIWSL 
SSFLPSGFYT 



TQNGRRQTAN 
ALLIRSMQYG 



GKR QLLSSLC 



AVEEQYYLLY 
DILNQPNTYY 
FGALLACLFV 



AFAHYITGDK 
FCLYLAPSLI 
SHAGHLRGFL 
VEKAEAVFIA 
PVYVFANNTS 
KDIPNVHWVD 
RLLKSSHGGA 



QLGXPAVSAV 



TLPTRILSAS 
AALTAGFSLL 



LVGYNLYARG 
DYVGSREGWK 
QFYDLRMGGQ 
ISRSPLREEK 
AQKYLPKNTV 
LQ* 



ILKQEHLRPL 
AKILSLDSEC 
PVPRFEAQSF 
LKRFAANQYL 
EIYGRYLYGD 



PIVFVGKISY 
SYYLIEQPLR 
PGAPLAAENH 
LVWVDEKLAD 
LIPGFPARFR 
RPIQAMGDIG 
QDHLTYFGSY 



Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical integral membrane protein HI0392 of H.influenzae (accession number U32723) 
ORF128 and HI0392 show 52% aa identity in 180aa overlap: 

Orfl28: 1 
HI0392: 



VSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGFQQGYFDLSADENPVLHIWSLAV 60 
++L S IAS IF+Y DFN++RKT+EL+ FLSN YLG QGYFDLSA+ENPVLHIWSLAV 
4 6 MALVSFIASAIFIYNDFNKLRKTIELAIAFLSNFYLGLTQGYFDLSANENPVLHIWSLAV 105 



Orf 128 : 61 EEQXXXXXXXXXIFCCKKTKSLRVLRNISIILFLILTASSFLPSGFYTDILNQPNTYYLS 120 

E Q I KK + ++VL I++ILF IL A+SF+ + FY ++L+QPN YYLS 

HI0392: 106 EGQYYLIFPLILILAYKKFREVKVLFIITLILFFILLATSFVSANFYKEVLHQPNIYYLS 165 

Orf 128: 121 TLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLCFGALLACLFVIDKHNPFIPGMT 180 

LRFPELL GSLLA+Y N + Q + +L+ L L +CLF+++ + FIPG+T 

HI0392: 166 NLRFPELLVGSLLAIYHNLSN-KVQLSKQVNNILAILSTLLLFSCLFLMNNNIAFIPGIT 224 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF128 shows 98.0% identity over a 244aa overlap with an ORF (ORF128a) from strain A ofN. 
meningitidis: 

10 20 30 

orf 128 . pep VSLASVIASQIFLYEDFNQMRKTVELSAVF 

I I I 1 I I I I I I I I I I I 1 I I I I I I I I I I I I I I 
orf 128a ILSEIQNGSFS FRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVF 

60 70 80 90 100 110 

40 50 60 70 80 90 

orf 128. pep LSNIYLGFQQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISI 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 

orf 128a LSNIYLGFQQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISI 
120 130 140 150 160 170 

100 110 120 130 140 150 

orf 128. pep ILFLILTASSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGK 
I I I I I I I I : I I I I I I I I I I I I I I II I I I I I I I I I I I I I M I I I i I I I I I I I I I I I I I I I I 
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orf!28a ILFLILTATSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGK 
180 190 200 210 220 230 

160 170 180 190 200 210 

orfl28.pep RQLLSSLCFGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPI 
I I I I M I I II I I I I I I I I I I I I I I I I I I I I I 11 ! I I I I I I I I I M ! I I 1 I I I I I I M i I ! 
orfl28a RQLLSSLCFGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPI 
240 250 260 270 280 290 



220 230 240 

orf 128 .pep VFVGKI S YSLYLYHWI FI AFAPLIRGGKQLGLPA 
I I I I I I I I I I I I I I I I I 1 I I I I I I I I 1 I i 1 
orf 128a VFVGKISYSLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKR 
300 310 320 330 340 350 



orf 128a KMTFKKAFFCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSH 
360 370 380 390 400 410 

The complete length ORF128a nucleotide sequence <SEQ ID 83 1> is: 



1 ATGCAAGCTG TCCGATACAG ACCGGAAATT GACGGATTGC GGGCCGTCGC 

51 CGTGCTATCC GTCATGATTT TCCACCTGAA TAACCGCTGG CTGCCCGGAG 

101 GATTCCTGGG GGTGGACATT TTCTTTGTCA TCTCAGGATT CCTCATTACC 

151 GGCATCATTC TTTCTGAAAT ACAGAACGGT TCTTTTTCTT TCCGGGATTT 

201 TTATACCCGC AGGATTAAGC GGATTTATCC TGCTTTTATT GCGGCCGTGT 

251 CGCTGGCTTC GGTGATTGCC TCTCAAATCT TCCTTTACGA AGATTTCAAC 

301 CAAATGCGGA AAACCGTGGA GCTTTCTGCG GTTTTCTTGT CCAATATTTA 

351 TCTGGGGTTT CAGCAGGGGT ATTTCGATTT GAGTGCCGAC GAGAACCCCG 

401 TACTGCATAT CTGGTCTTTG GCAGTAGAGG AACAGTATTA CCTCCTGTAT 

4 51 CCTCTTTTGC TGATATTTTG CTGCAAAAAA ACAAAATCGC TACGGGTGCT 

501 GCGTAACATC AGCATCATCC TATTTCTGAT TTTGACTGCC ACATCGTTTT 

551 TGCCAAGCGG GTTTTATACC GATATTCTCA ACCAACCCAA TACTTATTAC 

601 CTTTCGACAC TGAGGTTTCC CGAGCTGTTG GCAGGTTCGC TGCTGGCGGT 

651 TTACGGGCAA ACGCAAAACG GCAGACGGCA AACAGCAAAT GGAAAACGGC 

701 AGTTGCTTTC ATCACTCTGC TTCGGCGCAT TGCTTGCCTG CCTGTTCGTG 

751 ATTGACAAAC ACAATCCGTT TATCCCGGGA ATGACCCTGC TCCTTCCCTG 

801 CCTGCTGACG GCACTGCTTA TCCGGAGTAT GCAATACGGG ACACTTCCGA 

851 CCCGCATCCT GTCGGCAAGC CCCATCGTAT TTGTCGGCAA AATCTCTTAT 

901 TCCCTATACC TGTACCATTG GATTTTTATT GCTTTCGCCC ATTACATTAC 

951 AGGCGACAAA CAGCTCGGAC TGCCTGCCGT ATCGGCGGTT GCCGCGTTGA 

1001 CGGCCGGATT TTCCCTGTTG AGTTATTATT TGATTGAACA GCCGCTTAGA 

1051 AAACGGAAGA TGACCTTCAA AAAGGCATTT TTCTGCCTCT ATCTCGCCCC 

1101 GTCCCTGATA CTTGTCGGTT ACAACCTGTA CGCAAGGGGG ATATTGAAAC 

1151 AGGAACACCT CCGCCCGTTG CCCGGCGCGC CCCTTGCTGC GGAAAATCAT 

1201 TTTCCGGAAA CCGTCCTGAC CCTCGGCGAC TCGCACGCCG GACACCTGCG 

1251 GGGGTTTCTG GATTATGTCG GCAGCCGGGA AGGGTGGAAA GCCAAAATCC 

1301 TGTCCCTCGA TTCGGAGTGT TTGGTTTGGG TAGATGAGAA GCTGGCAGAC 

1351 AACCCGTTAT GTCGAAAATA CCGGGATGAA GTTGAAAAAG CCGAAGCCGT 

1401 TTTCATTGCC CAATTCTATG ATTTGAGGAT GGGCGGCCAG CCCGTGCCGA 

1451 GATTTGAAGC GCAATCCTTC CTAATACCCG GGTTCCCAGC CCGATTCAGG 

1501 GAAACCGTCA AAAGGATAGC CGCCGTCAAA CCCGTCTATG TTTTTGCAAA 

1551 CAACACATCA ATCAGCCGTT CGCCCCTGAG GGAGGAAAAA TTGAAAAGAT 

1601 TTGCCGCAAA CCAATATCTC CGCCCCATTC AGGCTATGGG CGACATCGGC 

1651 AAGAGCAATC AGGCGGTCTT TGATTTGATT AAAGATATTC CCAATGTGCA 

1701 TTGGGTGGAC GCACAAAAAT ACCTGCCCAA AAACACGGTC GAAATATACG 

1751 GCCGCTATCT TTACGGCGAC CAAGACCACC TGACCTATTT CGGTTCTTAT 

1801 TATATGGGGC GGGAATTTCA CAAACACGAA CGCCTGCTTA AATCTTCTCG 

1851 CGACGGCGCA TTGCAGTAG 

This encodes a protein having amino acid sequence <SEQ ID 832>: 



1 MQAVRYRPE I DGLRAVAVLS VMIFHL NNRW LPGGFL GVDI FFVISGFLIT 

51 GIIL SEIQNG SFSFRDFYTR RIKRIYP AFI AAVSLASVIA SQIFL YEDFN 

101 QMRKTVELSA VFLSNIYLGF QQGYFDLSAD ENPVLHIWSL AVEEQYYLLY 

151 PLLLIFCCKK TKSLRVLRN I SIILFLILTA TSFLPS GFYT DILNQPNTYY 

201 LSTLRFPELL AGSLLAVYGQ TQNGRRQTAN GKRQ LLSSLC FGALLACLFV 

251 IDKHNP FIPG MTLLLPCLLT ALLI RSMQYG TLPTRILSAS PIVFVGKISY 

301 SLYLYHWIFI AFAHYITGDK OLG LPAVSAV AALTAGFSLL SYYLIEQPLR 

351 KRKMTFKKAF FCLYLAPSLI LVGYNLYARG ILKQEHLRPL PGAPLAAENH 

401 FPETVLTLGD SHAGHLRGFL DYVGSREGWK AKILSLDSEC LVWVDEKLAD 

451 NPLCRKYRDE VEKAEAVFIA QFYDLRMGGQ PVPRFEAQSF LIPGFPARFR 
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501 ETVKRIAAVK PVYVFANNTS ISRSPLREEK LKRFAANQYL RPIQAMGDIG 
551 KSNQAVFDLI KDIPNVHWVD AQKYLPKNTV EIYGRYLYGD QDHLTYFGSY 
601 YMGREFHKHE RLLKSSRDGA LQ* 

ORF128a and ORF128-1 show 99.5% identity in 622 aa overlap: 



10 



15 



20 



25 



30 



35 



40 



45 



orf 128a. pep MQAVRYRPEIDGLRAVAVLSVMIFHLNNRWLPGGFLGVDIFFVISGFLITGIILSEIQNG 
I I I I I I I I I I i I I I I I I I I I I I I I I II I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I 
orf 128-1 MQAVRYRPEIDGLRAVAVLSVMIFHLNNRWLPGGFLGVDIFFVISGFLITGIILSEIQNG 

orf 128a . pep SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGF 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 128-1 SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGF 

orf 128a. pep QQG Y FDLS ADEN PVLH I WS LAVEEQY YLLY PLLLI FCCKKTKSLRVLRN I S 1 I LFLI LTA 

I I 1 1 I I 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 I 1 1 1 ! 1 1 I I 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 128-1 QQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISIILFLILTA 

orf 128a , pep TSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLC 
: I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
orf 128-1 SSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLC 

orf 128a . pep FGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 
I I I I II I I I I I I II I II I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 128-1 FGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 

orf 128a . pep SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMTFKKAF 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I II I I I I I I I I I 
orf 128-1 SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMTFKKAF 

orf 128a . pep FCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSHAGHLRGFL 

II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 
orf 12 8-1 FCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSHAGHLRGFL 

orf 128a . pep DYVGSREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 
I I I I I I I I I I I II I M I 1 1 I I 1 1 I II I I I 1 1 I I I II I I I I I I I I I I I I I I I I I I I 1 1 I I I 
orf 128-1 DYVGSREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 

orf 128a. pep PVPRFEAQS FLI PGFPARFRETVKRI AAVKPVYVFANNTS I SRS PLREEKLKRFAANQYL 
I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 
orf 128-1 PVPRFEAQS FLIPGFPARFRETVKRIAAVKPVYVFANNT SI SRS PLREEKLKRFAANQYL 

orf 128a. pep RPIQAMGDIGKSNQAVFDLIKDIPNVHWVDAQKYLPKNTVEIYGRYLYGDQDHLTYFGSY 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
orf 128-1 RPIQAMGDIGKSNQAVFDLIKDIPNVHWVDAQKYLPKNTVEIYGRYLYGDQDHLTYFGSY 

orf 128a. pep YMGRE FHKHERLLKS SRDGALQX 
I I I I I I I I I I I I I I I I : Mill 
orfl28-l YMGRE FHKHERLLKS S HGGALQX 



50 



55 



60 



65 



Homology with a predicted ORF from N.gonorrhoeae 

ORF128 shows 93.4% identity over 244 aa overlap with a predicted ORF (ORF128ng) from N. 
gonorrhoeae: 

orf 128 .pep VSLASVIASQI FLYEDFNQMRKTVELSAVF 30 

I III MM I II I M II Ml I I I 1:111:11 
orfl28ng ILSEIQNGSFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTIELSTVF 112 

orf 128. pep LSNIYLGFQQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISI 90 

I I I I I M I : II I I II I I I II I I I I I I M I I I I I I I II I I I I I I I II I I II I I I II I II 
orfl28ng LSNIYLGFRLGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCYKKTKSLRVLRNISI 172 

orf 128. pep ILFLILTASSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGK 150 

I I I 1 I M I I I I I I : II I I I I I I I I I I I I I I I II II I I I : I I I I I I I I I I I I I II I I III 
orfl28ng ILFLILTASSFLPAGFYTDILNQPNTYYLSTLRFPELLVGSLLAVYGQTQNGRRQTENGK 232 

orf 128. pep RQLLSSLCFGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPI 210 

I I I I I I I I I I M : I I I I I I I I : I I I I I : I I I I I I I I I I I II I I I I I I I I I I I I I I I II I 
orfl28ng RQLLSLLCFGALLVCLFVIDKHDPFIPGITLLLPCLLTALLIRSMQYGTLPTRILSASPI 292 
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orf 128 .pep VFVGKISYSLYLYHWIFIAFAPLIRGGKQLGLPA 244 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl28ng VFVGKISYSLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKR 352 

The complete length ORF128ng nucleotide sequence <SEQ ID 833> is: 

1 ATGCAAGCTG TCCGATACAG GCCTGAAATT GACGGATTGC GGGCCGTCGC 

51 CGTGCTATCC GTCATTATTT TCCACCTGAA TAACCGCTGG CTGCCCGGAG 

101 GATTCCTGGG GGTGGACATT TTCTTTGTCA TCTCGGGATT CCTCATTACC 

151 AACATCATTC TTTCTGAAAT ACAGAACGGT TCTTTTTCTT TCCGGGATTT 

201 TTATACCCGC AGGATTAAGC GGATTTATCC TGCTTTTATT GCGGCCGTGT 

251 CCCTGGCTTC GGTGATTGCT TCTCAAATCT TCCTTTACGA AGATTTCAAC 

301 CAAATGAGGA AAACCATAGA GCTTTCTACG GTTTTTTTGT CCAATATTTA 

351 TTTGGGGTTC CGATTGGGGT ATTTCGATTT GAGTGCCGAC GAGAACCCCG 

401 TACTGCATAT CTGGTCTTTG GCGGTAGAGG AACAGTATTA CCTCCTGTAT 

451 CCTCTTTTGC TGATATTCTG TTACAAAAAA ACCAAATCAC TACGGGTGCT 

501 GCGTAATATC AGCATCATCC TGTTTCTGAT TTTGACCGCA TCATCGTTTT 

551 TGCCGGCCGG GTTTTATACC GACATCCTCA ACCAACCcaa TACT T ATT AC 

601 CTTTCGACAC TGAGGTTTCC CGAGCTGTTG GTGGGTTCGC TGTTGGCGGT 

651 TTACGGGCAA ACGCAAAACG GCAGACGGCA AACAGAAAAT GGAAAACGGC 

701 AGTTGCTTTC ATTACTCTGT TTCGGCGCat tgCTTGTCTG CCTGTTCGTG 

751 ATCGACAAAC ACGATCCGTT TATCCCGGGA ATAACCCTGC TCCTTCCCTG 

801 CCTGCTGACG GCGCTGCTTA TCCGGAGTAT GCAATACGGG ACACTTCCGA 

851 CCCGCATCCT GTCGGCAAGC CCCATCGTAT TTGTCGGCAA AATCTCTTAT 

901 TCCCTATACC TGTACCATTG GATTTTTATT GCCTTCGCCC ATTACATTAC 

951 AGGCGACAAA CAGCTCGGAC TGCCTGCCGT ATCGGCGGTT GCCGCGTTGA 

1001 CGGCCGGATT TTCCCTGTTG AGCTATTATT TGATTGAACA GCCGCTTAGA 

1051 AAACGGAAGA TGACCTTCAA AAAGGCATTT TTCTGCCTTT ATCTCGCCCC 

1101 GTCCCTGATG CTTGTCGGTT ACAACCTGTA TTCAAGAGGG ATATTGAAAC 

1151 AGGAACACCT CCGCCCGCTG CCCGGCACGC CCGTTGCTGC GGAAAATAAT 

1201 TTTCCGGAAA CCGTCTTGAC CCTCGGCGAC TCGCACGCCG GACACCTGCG 

1251 GGGGTTTCTG GATTATGTCG GCGGCAGGGA AGGGTGGAAA GCTAAAATCC 

1301 TGTCCCTCGA TTCGGAGTGT TTGGTTTGGG TGGATGAGAA GCTGGCAGAC 

1351 AACCCGTTGT GCCGAAAATA CCGGGATGAA GTTGAAAAAG CCGAAGCTGT 

1401 TTTCATTGCC CAATTCTATG ATTTGAGGAT GGGCGGCCAG CCCGTGCCGA 

14 51 GATTTGAAGC GCAATCCTTC CTGATACCCG GGTTCAAAGC CCGATTCAGG 

1501 GAAACCGTCA AGAGGATAGC CGCCGTCAAA CCTGTATATG TTTTTGCAAA 

1551 CAATACATCA ATCAGCCGTT CTCCCTTGAG GGAGGAAAAA TTGAAAAGAT 

1601 TTGCTATAAA CCAATACCTC CGGCCTATTC GGGCTATGGG CGACATCGGC 

1651 AAGAGCAATC AGGCGGTCTT TGATTTGGTT AAAGATATTC CCAATGTGCA 

1701 TTGGGTGGAC GCACAAAAAT ACCTGCCCAA AAACACGGTC GAAATACACG 

1751 GACGCTATCT TTACGGCGAC CAAGACCACC TGACCTATTT CGGTTCTTAT 

1801 TATATGGGGC GGGAATTTCA CAAACACGAA CGCCTGCTCA AGCATTCCCG 

1851 AGGCGGCGCA TTGCAGTAG 

This encodes a protein having amino acid sequence <SEQ ID 834>: 

1 MQAVRYRPE I DGLRAVAVLS VIIFHL NNRW LPGGFLG VDI FFVISGFLIT 

51 NIIL SEIQNG SFSFRDFYTR RIKRIYP AFI AAVSLASVIA SQIFL YEDFN 

101 QMRKTIELST VFLSNIYLGF RLGYFDLSAD ENPVLHIWSL AVEEQYYLLY 

151 PLLLIFCYKK TKSLRVLRN I SIILFLILTA SSFLPA GFYT DILNQPNTYY 

201 LSTLRFPELL VGSLLAVYGQ TQNGRRQTEN GKRQ LLSLLC FGALLVCLFV 

251 IDKHDP FIPG ITLLLPCLLT ALLI RSMQYG TLPTRILSAS PIVFVGKISY 

301 SLYLYHWIFI AFAHYITGDK QLG LPAVSAV AALTAGFSLL SYYLIEQPLR 

351 KRKMTFKKAF FCLYLAPSLM LVGYNLYSRG ILKQEHLRPL PGTPVAAENN 

401 FPETVLTLGD SHAGHLRGFL DYVGGREGWK AKILSLDSEC LVWVDEKLAD 

451 NPLCRKYRDE VEKAEAVFIA QFYDLRMGGQ PVPRFEAQSF LIPGFKARFR 

501 ETVKRIAAVK PVYVFANNTS ISRSPLREEK LKRFAINQYL RPIRAMGDIG 

551 KSNQAVFDLV KDIPNVHWVD AQKYLPKNTV EIHGRYLYGD QDHLTYFGSY 

601 YMGREFHKHE RLLKHSRGGA LQ* 

ORF128ng and ORF128-1 show 95.7% identity in 622 aa overlap: 

orf 128-1. pep MQAVRYRPEIDGLRAVAVLSVMIFHLNNRWLPGGFLGVDIFFVISGFLITGIILSEIQNG 
Ml MIMII I I Mill II M:MM II II IIIMI I II I II I till 111:11 M II III 
orfl28ng MQAVRYRPEIDGLRAVAVLSVIIFHLNNRWLPGGFLGVDIFFVISGFLITNIILSEIQNG 

orf 128-1. pep SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGF 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I : I II : I I I I II I I I I 
orfl28ng SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTIELSTVFLSNIYLGF 
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QQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISIILFLILTA 

: I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I 
RLGYFDLSADENPVLHIWSIAVEEQYYLLYPLLLIFCYKKTKSLRVLRNISIILFLILTA 

SSFLPSGFYTDILNQPNTYYLSTLRFPELIAGSLLAVYGQTQNGRRQTANGKRQLLSSLC 

I I I I I : I I I I II I I I I I I I I I I I I I I I I M : I I I! II I II I I M I I I I illlllli il 
SSFLPAGFYTDILNQPNTYYLSTLRFPELLVGSLLAVYGQTQNGRRQTENGKRQLLSLLC 

FGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 

I I I I I : I I I I I I I I : I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I ! I Ml I I I I I I I I I I 
FGALLVCLFVIDKHDPFIPGITLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 

SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMTFKKAF 

II Ml I! I 111 I II Ml II lllllll I I MM III I M I I Ml I I I I I I I I MIMIIII 
SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMTFKKAF 

FCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSHAGHLRGFL 
I I I f I I t I t : I I I t I t I : I I I I I I I I I M I I I : I : M I I : I I I I I 1 I I I I t I 1 I t 1 I I I I 
FCLYLAPSLMLVGYNLYSRGILKQEHLRPLPGTPVAAENNFPETVLTLGDSHAGHLRGFL 

DYVGSREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 
M M : M I M I M M M M I M M M M M M M M II M II II I M M M M I I I I M I 
DYVGGREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 

PVPRFEAQSFLIPGFPARFRETVKRIAAVKPVYVFANNTSISRSPLREEKLKRFAANQYL 
I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I M I I I I I I I 
PVPRFEAQS FLI PGFKARFRETVKRIAAVKPVYVFANNTS I SRSPLREEKLKRFAINQYL 

RPIQAMGDIGKSNQAVFDLIKDIPNVHWVDAQKYLPKNTVEIYGRYLYGDQDHLTYFGSY 
I IMI III III I! M M MMMMM II I MM II I I II I I: M I I I I M Illllllll 
RPIRAMGDIGKSNQAVFDLVKDIPNVHWVDAQKYLPKNTVEIHGRYLYGDQDHLTYFGSY 

YMGRE FHKHERLLKS SHGGALQX 
I I i I M I I I I I I I I I : I I I I I I 
YMGRE FHKHERLLKHSRGG ALQX 
610 620 

In addition, ORF218ng shows homology to a hypothetical H.influenzae protein: 

sp|P43993|Y392_HAEIN HYPOTHETICAL PROTEIN HI0392 >gi 1 1074385 I pir | IB64007 
40 hypothetical protein HI0392 - Haemophilus influenzae {strain Rd KW20) 

>gi 1 1573364 (U32723) H. influenzae predicted coding region HI0392 [Haemophilus 
influenzae] Length = 245 
Score = 239 bits (604), Expect = 3e-62 

Identities = 124/225 (55%) , Positives = 152/225 (67%), Gaps - 1/225 (0%) 





Query: 


38 


VDIFFVISGFLITNIILSEIQNGSFSFRDFYTRRIKRIYPXXXXXXXXXXXXXXXXFLYE 


97 








+DIFFVISGFLIT II++EIQ SFS + FYTRRIKRIYP F+Y 






.Sbjct: 


1 


MDIFFVISGFLITGIIITEIQQNSFSLKQFYTRRIKRIYPAFITVMALVSFIASAIFIYN 


60 


50 


Query: 


98 


DFNQMRKTIELSTVFLSNIYLGFRLGYFDLSADENPVLHIWSLAVEEQXXXXXXXXXIFC 


157 








DFN++RKTIEL+ FLSN YLG GYFDLSA+ENPVLHIWSLAVE Q I 






Sbjct: 


61 


DFNKLRKTIELAIAFLSNFYLGLTQGYFDLSANENPVLHIWSLAVEGQYYLIFPLILILA 


120 


55 


Query: 


158 


YKKTKSLRVLRNISIILFLILTASSFLPAGFYTDILNQPNTYYLSTLRFPELLVGSLLAV 


217 




YKK + ++VL I++ILF IL A+SF+ A FY ++L+QPN YYLS LRFPELLVGSLLA+ 






Sbjct: 


121 


YKKFREVKVLFIITLILFFILLATSFVSANFYKEVLHQPNIYYLSNLRFPELLVGSLLAI 


180 




Query: 


218 


YGQTQNGRRQTENGKRQLLSLLCFGALLVCLFVTDKHDPFIPGIT 262 




60 






Y N + Q +L++L L CLF+++ + FIPGIT 




Sbjct: 


181 


YHNLSN-KVQLSKQVNNILAILSTLLLFSCLFLMNNNIAFIPGIT 224 





This analysis, including the identification of several putative transmembrane domains, suggests that 
these proteins from N [meningitidis and N.gonorrhoeae, and their epitopes, could be useful antigens 
for vaccines or diagnostics, or for raising antibodies. 



orfl28-l.pep 
orfl28ng 

5 

orf 128-1 .pep 
orfl28ng 
10 orf 128-1. pep 

orfl28ng 
orf 128-1. pep 

15 

orfl28ng 
orf 128-1. pep 
20 orfl28ng 

orf 128-1 .pep 
orfl28ng 

25 

orf 128-1. pep 
orf 128ng 
30 orfi28-l.pep 
orfl28ng 
orf 128-1. pep. 

35 

orfl28ng 
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Example 99 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 835>: 

1 . . ATTATTTACG AATACCGCTG GATGTTTCTT TACGGCGCAC TGACGACCTT 

51 GGGGCTGACG GTCGTGGCAA C.GCGGGCGG TTCGGTATTG GGTCTGTTGT 

101 TGGCGTTGGC GCGCCTGATT CACTTGGAAA AAGCCGGTGC GCCGATGCGC 

151 GTGCTGGCGT GGGCGTTGCG TAAAGTTTCG CTGCTGTATG TTACGCTGTT 

201 CCGGGGTACG CCGCTGTTTG TGCAGATTGT GATTTGGGCG TATGTGTGGT 

251 TTCCGTTTTT CGTC. . 

This corresponds to the amino acid sequence <SEQ ID 836; ORF129>: 

1 . . II 1XYRWMFL YGALTTLGLT WAXAGGSVL GLLLALARLI HLEKAGAPMR 
51 VLAWALRKVS LLYVTLFRGT PLFVQIVIWA YVWFPFFV. . 

Further work revealed the complete nucleotide sequence <SEQ ID 837>: 

1 ATGGATTTTC GTTTTGACAT TATTTACGAA TACCGCTGGA TGTTTCTTTA 

51 CGGCGCACTG ACGACCTTGG GGCTGACGGT CGTGGCAACG GCGGGCGGTT 

101 CGGTATTGGG TCTGTTGTTG GCGTTGGCGC GCCTGATTCA CTTGGAAAAA 

151 GCCGGTGCGC CGATGCGCGT GCTGGCGTGG GCGTTGCGTA AAGTTTCGCT 

201 GCTGTATGTT ACGCTGTTCC GGGGTACGCC GCTGTTTGTG CAGATTGTGA 

251 TTTGGGCGTA TGTGTGGTTT CCGTTTTTCG TCCATCCTTC AGACGGCATT 

301 TTGGTCAGCG GCGAGGCGGC AATCGCGCTG CGTCGCGGAT ACGGGCCGCT 

351 GATTGCCGGT TCTTTGGCAC TGATCGCCAA CTCGGGGGCG TATATCTGTG 

401 AGATTTTCCG CGCGGGCATC CAGTCTATAG ACAAAGGACA GATGGAGGCG 

451 GCGCGTTCTT TGGGGCTGAC CTATCCGCAG GCGATGCGCT ATGTGATTCT 

501 GCCGCAGGCA TTGCGCCGCA TGCTGCCGCC TTTGGCGAGC GAGTTCATCA 

551 CGCTCTTGAA AGACAGCTCG CTGCTGTCGG TCATTGCTGT GGCGGAGTTG 

601 GCGTATGTTC AGAATACGAT TACGGGCCGG TATTCGGTTT ATGAAGAACC 

651 GCTTTACACC GTCGCCCTGA TTTATCTGTT GATGACGACT TTCTTAGGCT 

701 GGATATTCCT GCGTTTGGAA AAACGTTACA ATCCGCAACA CCGCTGA 

This corresponds to the amino acid sequence <SEQ ID 838; ORF129-l>: 

1 MDFRFD 1 1 YE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTP LFV QIVIWAYVWF PFFV HPSDGI 

101 LVSGEAAIAL RRGYGP LIAG SLALIANSGA YIC EIFRAGI QSIDKGQMEA 

151 ARSLGLTYPQ AMRYVILPQA LRRMLPPLAS E FITLLKDSS LLSVIAVA EL 

201 AYVQNTITGR YSVYEEPLYT VALIYLLMTT FLGWIF LRLE KRYNPQHR* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF129 shows 98.9% identity over a 88aa overlap with an ORF (ORF129a) from strain A of N. 
meningitidis: 

10 20 30 40 50 

or f 12 9 . pep IIYEYRWMFLYGALTTLGLT WAXAGGSVLGLLLALA RLIHLEKAGAPMRVLAW 
I I 1 I I I t I I 1 I 1 I I 1 I t I I 1 t I I r | | | | t I 1 1 1 1 I I I I I ! I I I I I I I I I I 1 t t I 
orfl29a MDFRFDIIYEYRWMFLYGALTTLGLT WATAGGSVLGLLLALA RLIHLEKAGAPMRVLAW 

10 20 30 40 50 60 

60 70 80 

orf 129 . pep ALRKVSLLYVTLFRGTP LFVQIVIWAYVWFPFFV 

I I I I I I I I I I I I I I I I 1 I 1 I I I 1 I I I I I I 1 II I I 
orf 12 9a ALRKVSLLYVTLFRGTP LFVQIVIWAYVWFPFFV HPSDGILVSGEAAIALRRGYGPLIAG 

70 80 90 100 110 120 

orf 129a SLALIANSGAYIC EIFRAGIQSIDKGQMEAARSLGLTYPQAMRYVILPQALRRMLPPLAS 

130 140 150 160 170 180 

The complete length ORF129a nucleotide sequence <SEQ ID 839> is: 

1 ATGGATTTTC GTTTTGACAT TATTTACGAA TACCGCTGGA TGTTTCTTTA 
51 CGGCGCACTG ACGACCTTGG GGCTGACGGT CGTGGCGACG GCGGGCGGTT 
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101 CGGTATTGGG TCTGTTGTTG GCGTTGGCGC GCCTGATTCA CTTGGAAAAA 

151 GCCGGTGCGC CGATGCGCGT GCTGGCGTGG GCGTTGCGTA AGGTTTCGCT 

201 GCTGTATGTT ACGCTGTTCC GGGGTACGCC GCTGTTTGTG CAGATTGTGA 

251 TTTGGGCGTA TGTGTGGTTT CCGTTTTTCG TCCATCCTTC AGACGGCATT 

301 TTGGTTAGCG GCGAGGCGGC AATCGCGCTG CGTCGCGGAT ACGGGCCGCT 

351 GATTGCCGGT TCTTTGGCAC TGATCGCCAA CTCGGGGGCG TATATCTGTG 

401 AGATTTTCCG CGCGGGCATC CAGTCTATAG ACAAAGGACA GATGGAGGCG 

451 GCGCGTTCTT TGGGGCTGAC CTATCCGCAG GCGATGCGCT ATGTGATTCT 

501 GCCGCAGGCA TTGCGCCGTA TGCTGCCGCC TTTGGCGAGC GAGTTCATCA 

551 CGCTCTTGAA AGACAGCTCG CTGCTGTCGG TCATTGCTGT GGCGGAGTTG 

601 GCGTATGTTC AGAATACGAT TACGGGCCGG TATTCGGTTT ATGAAGAACC 

651 GCTTTACACC GTCGCCCTGA TTTATCTGTT GATGACGACT TTCTTAGGCT 

701 GGATATTCCT GCGTTTGGAA AAACGTTACA ATCCGCAACA CCGCTGA 

This encodes a protein having amino acid sequence <SEQ ID 840>: 

1 MDFRFDIIYE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTP LFV QIVIWAYVWF PFFV HPSDGI 

101 LVSGEAAIAL RRGYGP LIAG SLALIANSGA YIC EIFRAGI QSIDKGQMEA 

151 ARSLGLTYPQ AMRYVILPQA LRRMLPPLAS E FITLLKDSS LLSVIAVA EL 

201 AYVQNTITGR YSVYEEPLYT VALIYLLMTT FLGWIFL RLE KRYNPQHR* 

ORF129a and ORF129-1 show 100.0% identity in 248 aa overlap: 



orf 129a . pep MDFRFDIIYEYRWMFLYGALTTLGLTWATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 
I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 129-1 MDFRFDIIYEYRWMFLYGALTTLGLTWATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 



orf 129a . pep ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 

I I Ml I III II! I M II II I lllllll INI II II! Ill II MM MM I! II MINI ! 
orf 12 9-1 ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 

orf 129a. pep SLALIANSGAYICEIFRAGIQSIDKGQMEAARSLGLTYPQAMRYVILPQALRRMLPPLAS 

II M M M M M I M M M M M M M M M M M I M II I I M II I M M I M M M M 
orf 129-1 SLALIANSGAYICEIFRAGIQSIDKGQMEAARSLGLTYPQAMRYVILPQALRRMLPPLAS 

orf 12 9a. pep EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTVALIYLLMTTFLGWIFLRLE 
I I I II I II I I M I I I I I I I M I I I M II M I M I M I M I I I II I M II I I II II I I I M 
orf 12 9-1 EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTVALIYLLMTTFLGWIFLRLE 



orfl29a.pep KRYN PQHRX 
I 1 I M I I I I 
orf 12 9-1 KRYN PQHRX 



Homology with a predicted ORF from A ^gonorrhoeae 

ORF129 shows 98.9% identity over a 88 aa overlap with a predicted ORF (ORF129ng) from 
K gonorrhoeae: 

or f 12 9 . pep IIYEYRWMFLYGALTTLGLTWAXAGGSVLGLLLALARLIHLEKAGAPMRVIAW 54 

I I I I M I I M M I 1 II I II I I I I M M I I I I I M I I I M I t t II I II M M I I I 
orfl29ng MDFRFDIIYEYRWMFLYGALTTLGLTWATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 60 



orf 129. pep ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFV 88 

I I II I I I I I I I I M M I I M I I I I I I I I I I I I I I 
orfl29ng ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVILHTAFLGNAMRQSRRVPDKGRWIAG 120 

An ORF129ng nucleotide sequence <SEQ ID 841> was predicted to encode a protein having amino 
acid sequence <SEQ ID 842>: 



1 MDFRFDIIYE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTPL FV QIVIWAYVWF PFFVIL HTAF 

101 LGNAMRQSRR VPDKGRWIAG SLELNCQPRG RKTRGEFPPG ESNLGTEPRN 

151 PLSMGQRRFP GCENWYPPQN FIKK* 

Further work revealed the following gonococcal sequence <SEQ ID 843>: 



1 ATGGATTTTc gtTTTGACAT TATTTAcgaA TACCGCTGGA TGTTTCTTTA 
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51 CGGCGCACTG Acgaccttgg ggctgacggt cgtggcgacg gCGGGCGGTT 

101 CGGtattggG TCTGTTGTTG GCGTTGGCGC GCCTGATTCA CTTGGAAAAA 

151 GCCGGTGCGC CGATGCGCGT GCTGGCGTGG GCGTTGCGTA AGGTTTCGCT 

201 GCTGTACGTT ACCCTGTTCC GGGGTACGCC GCTGTTTGTG CAGATTGTGA 

251 TTTGGGCGTA TGTGTGGTTT CCGTTTTTCG TCCATCCTTC AGACGGCATT 

301 TTGGTCAGCG GCGAGGCGGC AATCGCGCTG CGTCGCGGAT ACGGGCCGCT 

351 GATTGCCGGT TCTTTGGCAC TGATCGCCAA CTCGGGGGCG TATATCTGTG 

401 AGATTTTCCG CGCGGGCATC CAGTCTATAG ACAAAGGACA GATGGAGGCG 

451 GCGTGTTCTT TGGGACTGAC CTATCCGCAG GCGATGCGCT ATGTGATTCT 

501 GCCGCAGGCA TTGCGCCGTA TGCTGCCGCC TTTGGCGAGC GAGTTCATCA 

551 CGCTCTTGAA AGACAGCTCG CTGCTGTCGG TCATTGCTGT GGCGGAGTTG 

601 GCGTATGTTC AGAATACGAT TACGGGCCGG TATTCGGTTT ATGAAGAACC 

651 GCTTTACACC GCCGCCCTGA TTTATCTGTT GATGACGACT TTCTTAGGCT 

701 GGATATTCCT GCGTTTGGAA AAACGTTACA ATCCGCAACA CCGCTGA 

This corresponds to the amino acid sequence <SEQ ID 844; ORF129ng-l>: 

1 MDFRFDIIYE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTP LFV QIVIWAYVWF PFFV HPSDGI 

101 LVSGEAAIAL RRGYGP LIAG SLALIANSGA Y IC E I FRAG I QSIDKGQMEA 

151 ARSLGLTYPQ AMRYVILPQA LRRMLPPLAS E FITLLKDSS LLSVIAVA EL 

201 AYVQNTITGR YSVYEEPLYT VALIYLLMTT FLGWIF LRLE KRYNPQHR* 

ORF129ng-l and ORF129-1 show 99.2% identity in 248 aa overlap: 

orf 129-1 .pep MDFRFDIIYEYRWMFLYGALTTLGLTVVATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 

I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl29ng-l MDFRFDIIYEYRWMFLYGALTTLGLTWATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 

orf 129-1. pep ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl29ng-l ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 

orf 129-1. pep SLALIANSGAYICEIFRAGIQSIDKGQMEAARSLGLTYPQAMRYVILPQALRRMLPPLAS 
I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I ) I I I I I I I I I I I I I 
orfl29ng-l SLALIANSGAYICEIFRAGIQSIDKGQMEAACSLGLTYPQAMRYVILPQALRRMLPPLAS 

orf 129-1. pep EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTVALIYLLMTTFLGWIFLRLE 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I : I I I I I I I I I I I I I I I I I I I 
orfl29ng-l EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTAALIYLLMTTFLGWIFLRLE 

orf 129-1. pep KRYNPQHRX 
I I I I I I I I I 
orfl29ng-l KRYNPQHRX 

In addition, ORF129ng-l is homologous to an ABC transporter from A.fulgidus: 

2650409 (AE001090) glutamine ABC transporter, permease protein (glnP) 
[Archaeoglobus fulgidus] Length = 224 
Score = 132 bits (329), Expect - 2e-30 

Identities - 86/178 (48%), Positives - 103/178 (57%), Gaps - 18/178 (10%) 

Query: 65 VSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAGSLAL 124 

+S YV + RGTPL VQI+I +F P+ GI + E A G +AL 

Sbjct: 58 I ST AYVEVT RGT PLLVQ I L I VYFGLPAIGINLQPEPA GIIAL 99 

Query: 125 IANSGAYICEIFRAGIQSIDKGQMEAACSLGLTYPQAMRYVILPQALRRMLPPLASEFIT 184 

SGAYI EI RAGI+SI GQMEAA SLG+TY QAMRYVI PQA R +LP L +EFI 
Sbjct: 100 SICSGAYIAEIVRAGIESIPIGQMEAARSLGMTYLQAMRYVIFPQAFRNILPALGNEFIA 159 

Query: 185 LLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTAALIYLLMTTFLGWIFLRLEKR 242 

LLKDSSLLSVI++ EL V I P AL YL+MT L + +K+ 

Sbjct: 160 LLKDSSLLSVISIVELTRVGRQIVNTTFNAWTPFLGVALFYLMMTIPLSRLVAYSQKK 217 

This analysis, including the identification of transmembrane domains in the two proteins, suggests 
that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful 
antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 100 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 845>: 

1 . . CTGAAAGAAT GCCGTCTGAA AGACCCTGTT TTTATTCCAA ATATCGTTTA 
51 TAAGAACATC GCCATTACTT TCCTGCTCTT GCACGCCGCC GCCGAACTTT 
101 GGCTGCCCGC GCAAACCGCC GGTTTTACCG CGCTCGCCGT CGGCTTCATC 
151 CTGCTCGCCA AGCTGCGTGA gCTTCACCAT CACGAACTCT TACGTAAACA 
201 cTACGTCCGC ACTTATTACy TGCTCCAACT CTTTGCCGCC GCAGgcTAgT 
251 TTGTGGACAG GCGCGGCGwA ATTACAAAAC CTGCCCGCyT CCGCGCCCCT 
301 GCACCTGATT ACCCTCGGCG GCATGATGGG CGGCGTGATG ATGGTGTGGc 
351 TGACCGCCGG ACTGTGGCAC AGCGGCTTTA CCAAACTCGA CTACCCCAAA 
401 CTCTGCCGCA TTGCCGTCCC CATCCTTTTC GCCGCCGCCG TCTCGCGCGC 
451 TTTCTTGrTG AACGTGAACC CGrTATTTTT CATTACCGTT CCTGCGATTC 
501 TGACCGCCGC CGTATTCGTA CTGTATCTTT TCrCGTTTAT ACCGATATTT 
551 CGGGCGAATG CGTTTACAGA CGATCCGGAr TAr 

This corresponds to the amino acid sequence <SEQ ID 846; ORF130>: 

1 . . LKECRLKDPV FIPNIVYKNI AITFLLLHAA AELWLPAQTA GFTALAVGFI 
51 LLAKLRELHH HELLRKHYVR TYYLLQLFAA AGSLWTGAAX LQNLPASAPL 
101 HLITLGGMMG GVMMWLTAG LWHSGFTKLD YPKLCRIAVP ILFAAAVSRA 
151 FLXNVNPXFF ITVPAILTAA VFVLYLFXFI PIFRANAFTD DPE* 

Further work revealed the complete nucleotide sequence <SEQ ID 847>: 

1 ATGCGGCCGT TTTTCGTCGG CGCGGCGGTG CTTGCCATAC TCGGTGCGCT 

51 GGTGTTTTTC ATCAACCCCG GTGCCATCGT CCTGCACCGC CAAATTTTCT 

101 TGGAACTTAT GCTGCCGGCG GCATACGGCG GTTTTTTGAC TGCGGCTTTG 

151 TTGGACTGGA CGGGTTTTTC GGGTAACCTG AAACCTGTCG CGACTTTGAT 

201 GGCGGCATTA TTGCTCGCCG CATCCGCTAT ACTGCCCTTT TCGCCGCAAA 

251 CTGCCTCGTT TTTCGTCGCC GCCTATTGGC TGGTGTTGCT GCTGTTCTGC 

301 GCCCGGCTGA TTTGGCTAGA CCGAAACACC GACAACTTCG CCCTGCTAAT 

351 GTTACTTGCC GCGTTCACTG TTTTTCAGAC GGCATATGCC GTCAGCGGCG 

401 ATTTGAACCT GTTGCGCGCG CAAGTGCATC TAAATATGGC GGCGGTGATG 

451 TTCGTATCCG TGCGCGTCAG TATTCTTTTG GGCGCGGAAG CCCTGAAAGA 

501 ATGCCGTCTG AAAGACCCTG TTTTTATTCC AAATATCGTT TATAAAAACA 

551 TCGCCATTAC TTTCCTGCTC TTGCACGCCG CCGCCGAACT TTGGCTGCCC 

601 GCGCAAACCG CCGGTTTTAC CGCGCTCGCC GTCGGCTTCA TCCTGCTCGC 

651 CAAGCTGCGT GAGCTTCACC ATCACGAACT CTTACGTAAA CACTACGTCC 

701 GCACTTATTA CCTGCTCCAA CTCTTTGCCG CCGCAGGCTA TTTGTGGACA 

751 GGCGCGGCGA AATTACAAAA CCTGCCCGCC TCCGCGCCCC TGCACCTGAT 

801 TACCCTCGGC GGCATGATGG GCGGCGTGAT GATGGTGTGG CTGACCGCCG 

851 GACTGTGGCA CAGCGGCTTT ACCAAACTCG ACTACCCCAA ACTCTGCCGC 

901 ATTGCCGTCC CCATCCTTTT CGCCGCCGCC GTCTCGCGCG CTTTCTTGAT 

951 GAACGTGAAC CCGATATTTT TCATTACCGT TCCTGCGATT CTGACCGCCG 

1001 CCGTATTCGT ACTGTATCTT TTCACGTTTA TACCGATATT TCGGGCGAAT 

1051 GCGTTTACAG ACGATCCGGA ATAA 

This corresponds to the amino acid sequence <SEQ ID 848; ORF130-1>: 

1 MRPFFVGAAV LAILGALVFF INPGAIVLHR QIFLELMLPA AYGGFLTA AL 

51 LDWTGFSGNL KP VATLMAAL LLAASAILP F SPQT AS FFVA AYWLVLLLFC 

101 ARLIWLDRNT DNF ALLMLLA AFTVFQTAYA V SGDLNLLRA QVHLN MAAVM 

151 FVSVRVSILL GA EALKECRL KDPVFIPNIV YKN IAITFLL LHAAAELWLP 

201 AQ TAGFTALA VGFILLAKL R ELHHHELLRK HYVRTYYLLQ LFAAAGYLWT 

251 GAAKLQNLPA SAPLH LITLG GMMGGVMMVW LT AGLWHSGF TKLDYPKLCR 

301 IAVPILFAAA VSRAFLM NVN P IFFITVPAI LTAAVFVL YL FTFIPIFRAN. 

351 AFTDDPE* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from Mmenineitidis (strain A) 

ORF130 shows 94.3% identity over a 193aa overlap with an ORF (ORF130a) from strain A of N. 
meningitidis: 
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10 20 30 

orf 130 . pep LKE CRLK D PV F I PN I VYKN I AI T FLLLHAA 

I I I I I I I I I I I! I I : I I I I I II I I I I I I I I 
orf 130a LNLLRAQVHLNMAAVMFVSVRVSILLGAEALKECRLKDPVFIPNWYKNIAITFLLLHAA 
140 150 160 170 180 190 

40 50 60 70 80 90 

orf 130 . pep AELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQLFAAAGSLWTGAAX 

I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I 
orf 130a AELWLPAQTAGFTSLAVGFILLAKLRELHHHELLRKHYVRTYYLLQLFAAAGYLWTGAAK 
200 210 220 230 240 250 



100 110 120 130 140 150 

orf 130. pep LQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 

I I I I I I I II I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 
orf 130a LQNLPASAPLHLITLGGMMGSVMMVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 
260 270 280 290 300 310 



160 170 180 190 

orf 130 . pep FLXNVNPXFFITVPAILTAAVFVLYLFXFIPIFRANAFTDDPEX 
Mill I f M I I I I I I I I I I I I I I I : I I M I I I I I I I i I I 
orf 130a VLMNVNPIFFITVPAILTAAVFVLYLLTFVPIFRANAFTDDPEX 
320 330 340 350 

The complete length ORF 130a nucleotide sequence <SEQ ID 849> is: 



1 ATGCGGCCGT TTTTCGTCGG CGCGGCGGTG CTTGCCATAC TCGGTGCGCT 

51 GGTGTTTTTC ATCAACCCCG GTGCCATCGT CCTGCACCGC CAAATTTTCT 

101 TGGAACTTAT GCTGCCGGCG GCATACGGCG GTTTTTTGAC TGCGGCTTTG 

151 TTGGACTGGA CGGGTTTTTC GGGTAACCTG AAACCTGTCG CGACTTTGAT 

201 GGCGGCATTA TTGCTCGCCG CATCCGCTAT ACTGCCCTTT TCGCCGCAAA 

251 CTGCCTCGTT TTTCGTCGCC GCCTATTGGC TGGTGTTGCT GCTGTTCTGC 

301 GCCCGGCTGA TTTGGCTAGA CCGAAACACC GACAACTTCG CCCTGCTAAT 

351 GTTACTTGCC GCGTTCACTG TTTTTCAGAC GGCATATGCC GTCAGCGGCG 

401 ATTTGAACCT GTTGCGCGCG CAAGTGCATC TAAATATGGC GGCGGTGATG 

451 TTCGTATCCG TGCGCGTCAG TATTCTTTTG GGCGCGG AAG CCCTGAAAGA 

501 ATGCCGTCTG AAAGACCCAG TATTCATCCC CAATGTCGTC TATAAAAACA 

551 TCGCCATTAC CTTCCTGCTC CTGCACGCCG CCGCCGAACT TTGGCTGCCT 

601 GCGCAAACCG CCGGTTTTAC CTCGCTCGCC GTCGGCTTTA TCCTGCTTGC 

651 CAAGCTGCGT GAGCTTCACC ATCACGAACT CCTGCGCAAA CACTACGTCC 

701 GCACTTATTA CCTGCTCCAA CTCTTTGCCG CCGCAGGCTA TTTGTGGACA 

751 GGCGCGGCGA AATTACAAAA CCTGCCCGCC TCCGCGCCCC TGCACCTGAT 

801 TACCCTCGGT GGCATGATGG GCAGCGTGAT GATGGTGTGG CTGACTGCCG 

851 GACTGTGGCA CAGCGGCTTT ACCAAGCTCG ACTACCCGAA ACTCTGCCGC 

901 ATCGCCGTCC CCATCCTNTT CGCCGCCGCC GTTTCGCGCG CTGTTTTAAT 

951 GAACGTAAAC CCGATATTCT TCATCACCGT CCCCGCAATT CTGACCGCCG 

1001 CCGTGTTCGT GCTTTACCTG CTGACATTCG TACCGATCTT TCGGGCGAAC 

1051 GCGTTTACAG ACGATCCGGA ATAA 

This encodes a protein having amino acid sequence <SEQ ID 850>: 



1 MRPFFVGAAV LAILGALVFF INPGAIVLHR QIFLELMLPA AYGGFLTAA L 

51 LDWTGFSGNL KP VATLMAAL LLAASAILP F SPQT ASFFVA AYWLVLLLFC 

101 ARLIWLDRNT DNF ALLMLLA AFTVFQTAYA V SGDLNLLRA QVHLN MAAVM 

151 FVSVRVSILL GA EALKECRL KDPVFIPNW YKN IAITFLL LHAAAELWLP 

201 A QTAGFTSLA VGFILLAKL R ELHHHELLRK HYVRTYYLLQ LFAAAGYLWT 

251 GAAKLQNLPA SAPLH LITLG GMMGSVMMVW LT AGLWHSGF TKLDYPKLCR 

301 IAVPILFAAA VSRAVLMN VN P IFFITVPAI LTAAVFVL YL LTFVPIFRAN 

351 AFTDDPE* 

ORF130a and ORF130-1 show 98.3% identity in 357 aa overlap: 



orf 130a . pep MRPFFVGAAVLAILGALVFFINPGAIVLHRQIFLELMLPAAYGGFLTAALLDWTGFSGNL 

M I I I I I I I I I I I I I I I I II I I I 1 I I I I I I I I 1 I I I I II I I I M I I I I I I I I I I I I I I M 
orf 130-1 MRPFFVGAAVLAILGALVFFINPGAIVLHRQIFLELMLPAAYGGFLTAALLDWTGFSGNL 

orf 130a. pep KPVATLMAALLLAASAILPFSPQTASFFVAAYWLVLLLFCARLIWLDRNT DNFALLMLLA 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 k 1 1 1 1 1 1 1 1 r i i 1 1 1 1 1 1 1 e 1 1 1 1 1 1 1 1 1 1 1 1 1 

or f 1 3 0 - 1 KPVATLMAALLLAASAILPFS PQTAS FFVAAYWLVLLLFCARLIWLDRNTDN FALLMLLA 



orf 130a. pep 



AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSILLGAEALKECRLKDPVFIPNW 
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I j I I! M I I II II jllllil I IMM I I II II 111 I I I I I I II I I II I II llll!!l|:| 
orfl30-l AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVS I LLGAEALKECRLKDPVFI PNI V 

orf 130a. pep YKNIAITFLLLHAAAELWLPAQTAGETSLAVGFILLAKLRELHHHELLRKHYVRTyYLLQ 
I I I I ! I I I I I i I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
orf 130-1 YKNIAITFLLLHAAAELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQ 

orf 130a . pep LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMMGSVMMVWLTAGLWHSGFTKLDYPKLCR 
I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 130-1 LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCR 

orf 130a. pep I AVP I L FAAAVSRAVLMN VN P I FFI T VPA I LTAAVFVL YLLT FV P I FRANAFTD D PE 

I I I I 1 I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I : i I : II I I I I I I I I I I I 
orf 130-1 IAVPILFAAAVSRAFLMNVNPIFFITVPAILTAAVFVLYLFTFIPIFRANAFTDDPE 

Homology with a predicted ORF from N. gonorrhoeae 

ORF130 shows 91.7% identity over a 193 aa overlap with a predicted ORF (ORF130ng) from 



K gonorrhoeae: 

orf 130. pep 
orf 130ng 
orfl30.pep 
orfl30ng 
orf 130. pep 
orf 130ng 
orf 130. pep 
orfl30ng 

An ORF130ng nucleotide sequence <SEQ ID 851> was predicted to encode a protein having amino 



LKECRLKDPVFIPNIVYKNIAITFLLLHAA 30 

II I I I I I I I I I I I I :: I I I I I I I I I I I I I 
LNLLRAQVHLNMAAVMFVSVRVSVLLGTETLKECRLKDPVFIPNVIYKNIAIT-LLLHAA 201 

AELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQLFAAAGSLWTGAAX 90 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 1 I I 
AELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQLFAAAGYLWTGAAK 261 

LQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 150 

t t t I I I I I f I 1 I 1 I 1 t 1 I I I I II I I I I I I I I I I I I I I I I I I I I I 1 I I I llihlllll 
LQNLPASAPLHLITLGGMTGGVMMVWLTAGLWHSGFTKLDYPKLCRIAVSILFASAVSRA 321 

FLXNVNPX FFITVPAI LTAAVFVLYLFXFI P I FRANAFTDDPE 193 

I I I I I I I I I I I I I I I I I I : I I I :: I : I I I I I I I I I I I I I 
V LMN VN P I FFI T VPE I LT AAV FML YLLT FV PI FRANAFTDDPE 364 



acid sequence <SEQ ID 852>: 



1 MNKFFTHPMH PFFVGA AVLA ILGALVFFHQ PRRYHPAPPN FLGTYAAGCI 

51 RRFFDYRFVG PDGFFRQPET CRYFDG GWA CCGCFIAVFT ATC RIFRRRL 

101 LAGVAAVLRL ADLARRQHRT LRSVDVTAAF TVFQTAYAVS GDLNLLRAQV 

151 H LNMAAVMFV SVRVSVLL GT ETLKECRLKD P VFIPNVIYK NIAITLLL HA 

201 AAELWLPA QT AGFTALAVGF ILLAKL RELH HHELLRKHYV RTYYLLQLFA 

251 AAGYLWTGAA KLQNLPASAP LHLITLGGMT GGVMMVWLTA GLWHSGFTKL 

301 DYPKLCR IAV SILFASAVSR AVLM NVNPIF FITVPE ILTA AV FML YLLT F 

351 VPIFRANAFT DDPE* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 853>: 



1 ATGCGCCCGT TTTTCGTCGG TGCGGCAGTA CTTGCCATAC TCGGTGCGTT 

51 GGTGTTTTTT ATCAACCCCG GCGCTATCAT CCTGCACCGC CAAATTTTCT 

101 TGGAACTTAT GCTGCCGGCT GCATACGGCG GTTTTTTGAC TACCGCTTTG 

151 TTGGACCGGA CGGGTTTTTC AGGCAACCTG AAACCTGCCG CTACTTTGAT 

201 GGCGGTGTTG TTGCTTGTTG CGGCTGTTTT ATTGCCGTTT TTACCGCAAC 

251 TTGCCGCATT TTTCGTCGCC GCCTATTGGC TGGTGTTGCT GCTGTTCTGC 

301 GCCTGGCTGA TTTGGCTCGA CCGCAACACC GACAACTTCG CTCTGTTGAT 

351 GTTACTTGCC GCATTTACCG TTTTTCAGAC GGCCTATGCC GTCAGCGGCG 

401 ATTTGAACTT ACTGCGCGCG CAAGTGCATT TGAATATGGC GGCGGTCATG 

451 TTCGTATCCG TCCGCGTCAG CGTCCTTTTG GGCACGGAAA CCCTGAAAGA 

501 ATGCCGTCTG AAAGACCCCG TATTCATCCC CAACGTTATC TATAAAAACA 

551 TCGCCATCAC CCTGCTGCTG CACGCCGCCG CCGAACTTTG GCTGCCCGCG 

601 CAAACCGCCG GTTTTACTGC GCTTGCCGTC GGCTTCATCC TGCTCGCCAA 

651 GCTGCGCGAA CTGCACCATC ACGAACTCTT ACGCAAACAC TACGTCCGCA 

701 CTTATTACCT GCTCCAGCTC TTTGCCGCCG CAGGTTATCT GTGGACAGGC 

751 GCGGCGAAAC TGCAAAACCT GCCCGCCTCC GCGCCCCTGC ACCTGATTAC 

801 CCTCGGCGGC ATGACGGGTG GCGTGATGAT GGTGTGGCTG ACTGCCGGAC 

851 TGTGGCACAG CGGCTTTACC AAACTCGACT ACCCGAAACT CTGCCGCATC 



WO 99/24578 



-465- 



PCT/IB98/01665 



901 GCCGTCTCCA TCCTTTTCGC CTCCGCCGTT TCGCGCGCTG TTTTAATGAA 

951 CGTGAATCCG ATATTCTTCA TCACCGTTCC CGAGATTCTG ACCGCCGCCG 

1001 TGTTCATGCT TTACCTGCTG ACGTTCGTAC CGATTTTTCG AGCGAACGCG 

1051 TTTACAGACG ATCCGGAATA A 

5 This corresponds to the amino acid sequence <SEQ ID 854; ORF130ng-l>: 



10 



i 

51 
101 
151 
201 
251 
301 
351 



MRPF FVGAAV LAILGALVFF I NPGAIILHR QIFLELMLPA AYGGFLTTAL 
LDRTGFSGNL KPA ATLMAVL LLVAAVLLPF L PQ LAAFFVA AYWLVLLLFC 



AWLIWLDRNT DNF ALLMLLA AFTVFQTAYA V SGDLNLLRA 
FVSVRVSVLL GTETLKECRL KDP VFIPNVI YKNIAITLLL 
Q TAGFTALAV GFILLAKL RE LHHHELLRKH YVRTYYLLQL 
AAKLQNLPAS APLHLITLGG MTGGVMMVWL TAGLWHSGFT 
AVSILFASAV SRAVLM NVNP IFFITVPE IL TAAVFMLYLL 
FTDDPE* 



QVHLNMAAVM 
HAAAELWLPA 
FAAAGYLWTG 
KLDYPKLCRI 
TFVPIFRANA 



ORF130ng-l and ORF130-1 show 92.4% identity in 357 aa overlap: 



15 



20 



25 



30 



35 



orf 130-1. pep 
orfl30ng-l 
orfl30-l.pep 
orfl30ng-l 
orf 130-1. pep 
orfl30ng-l 
orf 130-1 .pep 
orfl30ng-l 
orf 130-1. pep 
orfl30ng-l 
orf 130-1. pep 
orfl30ng-l 



MRPFFVGAAVLAILGALVFFINPGAIVLHRQIFLELMLPAAYGGFLTAALLDWTGFSGNL 

I I I I I II i I I I I I I I I I I I I I I I II I : I I I I I I I I I I I M I I I I I I I : I I I I I I M I I I 
MRPFFVGAAVLAILGALVFFINPGAIILHRQIFLELMLPAAYGGFLTTALLDRTGFSGNL 

KPVATLMAALLLAASAILPFSPQTASFFVAAYWLVLLLFCARLIWLDRNTDNFALLMLLA 
I I : II I I 1:1 I I: |::: I I I II I : II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
KPAATLMAVLLLVAAVLLPFLPQLAAFFVAAYWLVLLLFCAWLIWLDRNTDNFALLMLLA 

AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSILLGAEALKECRLKDPVFIPNIV 

I I I I I II M II I II II I I I I I I II 11 M I I I I M t I I : I I I : I : I I I I I I I I I I I I II: : 
AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSVLLGTETLKECRLKDPVFIPNVI 

YKNIAITFLLLHAAAELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQ 
I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I 
YKN I AIT- LLLHAAAELWLPAQTAGFTALAVG FI LLAKLRE LHHHELLRKH YVRTYYLLQ 

LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCR 

I I I I I I I I II I I II I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II M I I I I 
LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMTGGVMMVWLTAGLWHSGFTKLDYPKLCR 

IAVPILFAAAVSRAFLMNVNPIFFITVPAILTAAVFVLYLFTFIPIFRANAFTDDPEX 
III II 11:11111 IMII1IIIIIII M I ! t I I : I I I : I I : I I I II I I I I I I M I 
IAVSILFASAVSRAVLMNVNPIFFITVPEILTAAVFMLYLLTFVPIFRANAFTDDPEX 



Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

40 Example 101 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 855>: 



45 



50 



1 ATGGAAATTC 

51 TACGGTTGCA 

101 CCGGCTGGTG 

151 GGCGGCGAGA 

201 CGGCAATAGT 

251 ACTTTTACAG 

301 ACGCGTGACG 

351 CTGCTTGGAA 



GGGCAATAAA 
GGCTGCCGGC 
TAAGCCGAGA 
GTCCGCCGTC 
TCCGTCAGGG 
GAAAATAGGG 
GCAAACCTTT 
AAG. . 



ATATACGGCA 
TGGCGGGGTG 
AAACCGGCTG 
TTTAGGGGAC 
CAAACGAATA 
AAGTTTGAAG 
GATTGAGACG 



ATGGCTGCGT 
GTATGAGTGT 
CCATCGATTT 
TACGAGATAC 
TGAATCCGCA 
C.TGCGGGCT 
TTCAAACAGG 



TGCTTGCATT 
TCGTCCCTCA 
TTGGGATATT 
CGCTTTCAGA 
CAACAATCTT 
GGATTGGCGT 
GAGGATTTGA 



This corresponds to the amino acid sequence <SEQ ID 856; ORF131>: 

1 MEIRAIKYTA MAALLAFTVA GCRLAGWYEC SSLTGWCKPR KPAAIDFWDI 
51 GGESPPSLGD YEIPLSDGNS SVRANEYESA QQSYFYRKIG KFEXCGLDWR 
101 TRDGKPLIET FKQGGFDCLE K. . 

Further work revealed the complete nucleotide sequence <SEQ ID 857>: 



55 



1 ATGGAAATTC GGGCAATAAA ATATACGGCA ATGGCTGCGT TGCTTGCATT 
51 TACGGTTGCA GGCTGCCGGC TGGCGGGGTG GTATGAGTGT TCGTCCCTCA 
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101 CCGGCTGGTG TAAGCCGAGA AAACCGGCTG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GTCCGCCGTC TTTAGGGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATCGT TCCGTCAGGG CAAACGAATA TGAATCCGCA CAACAATCTT 

251 ACTTTTACAG GAAAATAGGG AAGTTTGAAG CCTGCGGGCT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GATTGAGACG TTCAAACAGG GAGGATTTGA 

351 CTGCTTGGAA AAGCAGGGGT TGCGGCGCAA CGGTCTGTCC GAGCGCGTCC 

401 GATGGTAA 

This corresponds to the amino acid sequence <SEQ ED 858; ORF131-l>: 

1 MEIRAIKYTA MAALLAFTVA G CRLAGWYEC SSLTGWCKPR KPAAIDFWDI 
51 GGESPPSLGD YEIPLSDGNR SVRANEYESA QQSYFYRKIG KFEACGLDWR 
101 TRDGKPLIET FKQGGFDCLE KQGLRRNGLS ERVRW* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF131 shows 95.0% identity over a 121aa overlap with an ORF (ORF131a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 131. pep MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 
II I I I I I I I I II M M I I I I I I I I I I! II M I I : I I I I I I I I I I I I I M I I I I I I I I I I 
orfl31a MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLSGWCKPRKPAAIDFWDIGGESPPSLED 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 131 . pep YEIPLSDGNSSVRANEYESAQQSYFYRKIGKFEXCGLDWRTRDGKPLIETFKQGGFDCLE 
I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I II Mill: 
orf 131a YEIPLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQEGFDCLK 

70 80 90 100 110 120 



orf 131 .pep K 
I 

o r f 1 3 1 a KQG LRRNGL S ERVRWX 

130 

The complete length ORF131a nucleotide sequence <SEQ ID 859> is: 

1 ATGGAAATTC GGGCAATAAA ATATACGGCA ATGGCTGCGT TGCTTGCATT 

51 TACGGTTGCA GGCTGCCGGT TGGCAGGTTG GTATGAGTGT TCGTCCCTGT 

101 CCGGCTGGTG TAAGCCGAGA AAACCTGCCG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GTCCTCCGTC TTTAGAGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATCGT TCCGTCAGGG CAAACGAATA TGAATCCGCA CAACAATCTT 

251 ACTTTTACAG GAAAATAGGG AAGTTTGAAG CCTGCGGGTT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GATTGAGACG TTCAAACAGG AAGGTTTTGA 

351 TTGTTTGAAA AAGCAGGGGT TGCGGCGCAA CGGTCTGTCC GAGCGCGTCC 

401 GATGGTAA 

This encodes a protein having amino acid sequence <SEQ ID 860>: 

1 MEIRAIKYTA MAALLAFTVA G CRLAGWYEC SSLSGWCKPR KPAAIDFWDI 
51 GGESPPSLED YEIPLSDGNR SVRANEYESA QQSYFYRKIG KFEACGLDWR 
101 TRDGKPLIET FKQEGFDCLK KQGLRRNGLS ERVRW* 

ORF131a and ORF131-1 show 97.0% identity in 135 aa overlap: 



orf 131a. pep MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLSGWCKPRKPAAIDFWDIGGESPPSLED 
I 1 M I I I I I I I I I I I I I I I I I I I i I I I 1 I I I I I : I I I I I I I 11 i I I I I I I I I I I I I II I 
orf 131-1 MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 



orf 131a . pep YE I PLS DGNRSVRANEYE S AQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQEGFDCLK 
I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I M I I I I I I II I I I I Mill: 
orf 131-1 YEIPLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQGGFDCLE 



orfl31a.pep 



KQGLRRNGLS ERVRWX 
I I I I I I I I I I M I I M 
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orf 131-1 KQGLRRNGLSERVRWX 

Homology with a predicted ORF from gonorrhoeae 

ORF131 shows 893% identity over 121 aa overlap with a predicted ORF (ORF131ng) from 
5 N. gonorrhoeae: 

orf 131 .pep MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 60 

1111:11111 I I I : I I I I I I I I I I I I I I I I ! : I I I I I I I I I I I I I I I I I I I I I II I 
orfl31ng MEIRVIKYTATAALFAFTVAGCRIAGWYECLSLSGWCKPRKPAAIDFWDIGGESPLSLED 60 

10 orf 131 .pep YEIPLSDGNSSVRANEYESAQQSYFYRKIGKFEXCGLDWRTRDGKPLIETFKQGGFDCLE 120 

MINIMI I I M M M I I I: I II I M I II I I I M II I I I I I I I I : I III MINI 
orfl31ng YEIPLSDGNRSVRANEYESAQKSYFYRKIGKFEACGLDWRTRDGKPLVERFKQEGFDCLE 120 

orf 131. pep K 121 
15 I 

orfl31ng KQGLRRNGLSERVRW 134 

A complete length ORF131ng nucleotide sequence <SEQ ID 861> was predicted to encode a 
protein having amino acid sequence <SEQ ID 862>: 

1 MEIRVIKYTA TAALFAFTVA GC RLAGWYEC LSLSGWCKPR KPAAIDFWDI 
20 51 GGESPLSLED YEIPLSDGNR SVRANEYESA QKSYFYRKIG KFEACGLDWR 

101 TRDGKPLVER FKQEGFDCLE KQGLRRNGLS ERVRW* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 863>: 

1 ATGGAAATTC GGGTAATAAA ATATACGGCA ACGGCTGCGT TGTTTGCATT 

51 TACGGTTGCA GGCTGCCGGC TGGCGGGGTG GTATGAGTGT TCGTCCTTGT 

25 101 CCGGCTGGTG TAAGCCGAGA AAACCTGCCG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GtccgctGTC TTTAGAGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATCGT TCCGTCAGGG CAAACGAATA TGAATCCGCG CAAAAATCTT 

251 ACTTTTATAG GAAAATAGGG AAGTTTGAAG CCTGCGGGTT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GGTTGAGAGG TTCAAACAGG AAGGTTTCGA 

30 351 CTGTTTGGAA AAGCAGGGGT TGCGGCGCAA CGGCCTGTCC GAGCGCGTCC 

401 GATGGTAA 

This corresponds to the amino acid sequence <SEQ ID 864; ORF131ng-l>: 



1 MEIRVIKYTA TAALFAFTVA G CRLAGWYEC SSLSGWCKPR KPAAIDFWDI 
51 GGESPLSLED YEIPLSDGNR SVRANEYESA QKSYFYRKIG KFEACGLDWR 
35 101 TRDGKPLVER FKQEGFDCLE KQGLRRNGLS ERVRW* 

ORF131ng-l and ORF131-1 show 92.6% identity in 135 aa overlap: 



orf 131ng-l . pep MEIRVIKYTATAALFAFTVAGCRLAGWYECSSLSGWCKPRKPAAIDFWDIGGESPLSLED 
IMMMIII 11 I : I I I I I II II I II M M M : I I I I I I II II I I II I M I It I I I I 
orf 131-1 MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 

40 

orfl31ng-l.pep YEIPLSDGNRSVRANEYESAQKSYFYRKIGKFEACGLDWRTRDGKPLVERFKQEGFDCLE 
I M I II I M I II I M I II M I: M II M I I I M M I I I I II I I I I M : I III llilll 
orf 131-1 YEIPLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQGGFDCLE 

45 orfl31ng-l.pep KQGLRRNGLSERVRWX 

I I I I I I I I II II I I i 1 
orfl31-l KQGLRRNGLSERVRWX 

Based on the presence of a predicted prokaryotic membrane lipoprotein lipid attachment site, it is 
predicted that the proteins from N. meningitidis and K gonorrhoeae, and their epitopes, could be 



50 useful antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 102 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 865> 

1 ATGAAACACA TCCATATTAT CGGTATCGGC GGCACGTTTA TGGGCGGGCT 

51 TGCCGCCATT GCCAAAGAAG CGGGGTTTGA AGTCAGCGGT TGCGACGCGA 

101 AGATGTATCC GCCGATGAGC ACCCAGCTCG AAGCCTTGGG TATAGACGTG 

151 TATGAAGGCT TCGATGCCGC TCAGTTGGAC GAATTTAAAG CCGACGTTTA 

201 CGTTATCGGC AATGTCGCCA AGCGCGGGAT GGATGTGGTT GAAGCGATTT 

251 TGAACCTCGG CCTGCCtTAT ATtTcCGGCC CGCAATGGCT GTCGGAAAAC 

301 GTGCTGCACC ATCATTGGGT ACTCGGTGTG GCGGGGACgC ACGGCAAAAC 

351 GACCACCGCC TCCATGCTCG CATGGGTCTT GGAATATgCC GGCCTCGCGC 

401 CGGGCTTCCT TATtGGCGGC GTACC.GGAA AATttCGGCG TTTCCGCCCG 

451 CCTGCCGCAA ACGCCGCGCC AAGACCCGAA CAGCCAATCG CCGTTTTTcG 

501 TCATCGAAGC CGACGAATAC GACACCGCCT TTtTCGACAA ACGTTCTAAA 

551 TtCGTGCATT ACCGTCCGCG TACCGCCGTG TTGAACAATC TGGAATTCGA 

601 CCACGCCGAC ATCTTTGCCG ACTTGGGCGC GATACAGACc CAGTTCCACT 

651 ACCTCGTGCG TACCGTGCCG TCTGAAGGCT TAATCGTCTG CAACGGACGG 

701 CAGCAAAGCC TGCAAGATAC TTTGGACAAA GGCTGCTGGA CGCCGGTGGA 

751 AAAATTCGGC ACGGAACACG GCTGGCA. . 

This corresponds to the amino acid sequence <SEQ ID 866; ORF132>: 

1 MKHIHIIGIG GTFMGGLAAI AKEAGFEVSG CDAKMYPPMS TQLEALGIDV 

51 YEGFDAAQLD EFKADVYVIG NVAKRGMDW EAILNLGLPY ISGPQWLSEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VXGKFRRFRP 

151 PAANAAPRPE QPIAVFRHRS RRIRHRLFRQ TFXIRALPSA YRRVEQSGIR 

201 PRRHLCRLGR DTDPVPLPRA YRAVXRLNRL QRTAAKPARY FGQRLLDAGG 

251 KIRHGTRLA. . 

Further work revealed the complete nucleotide sequence <SEQ ID 867>: 

1 ATGAAACACA TCCATATTAT CGGTATCGGC GGCACGTTTA TGGGCGGGCT 

51 TGCCGCCATT GCCAAAGAAG CGGGGTTTGA AGTCAGCGGT TGCGACGCGA 

101 AGATGTATCC GCCGATGAGC ACCCAGCTCG AAGCCTTGGG TATAGACGTG 

151 TATGAAGGCT TCGATGCCGC TCAGTTGGAC GAATTTAAAG CCGACGTTTA 

201 CGTTATCGGC AATGTCGCCA AGCGCGGGAT GGATGTGGTT GAAGCGATTT 

251 TGAACCTCGG CCTGCCTTAT ATTTCCGGCC CGCAATGGCT GTCGGAAAAC 

301 GTGCTGCACC ATCATTGGGT ACTCGGTGTG GCGGGGACGC ACGGCAAAAC 

351 GACCACCGCC TCCATGCTCG CATGGGTCTT GGAATATGCC GGCCTCGCGC 

401 CGGGCTTCCT TATTGGCGGC GTACCGGAAA ATTTCGGCGT TTCCGCCCGC 

451 CTGCCGCAAA CGCCGCGCCA AGACCCGAAC AGCCAATCGC CGTTTTTCGT 

501 CATCGAAGCC GACGAATACG ACACCGCCTT TTTCGACAAA CGTTCTAAAT 

551 TCGTGCATTA CCGTCCGCGT ACCGCCGTGT TGAACAATCT GGAATTCGAC 

601 CACGCCGACA TCTTTGCCGA CTTGGGCGCG ATACAGACCC AGTTCCACTA 

651 CCTCGTGCGT ACCGTGCCGT CTGAAGGCTT AATCGTCTGC AACGGACGGC 

701 AGCAAAGCCT GCAAGATACT TTGGACAAAG GCTGCTGGAC GCCGGTGGAA 

751 AAATTCGGCA CGGAACACGG CTGGCAGGCC GGCGAAGCCA ATGCCGACGG 

801 CTCGTTCGAC GTGTTGCTCG ACGGCAAAAC CGCCGGACGC GTCAAATGGG 

851 ATTTGATGGG CAGGCACAAC CGCATGAACG CGCTCGCCGT CATTGCCGCC 

901 GCGCGTCATG TCGGTGTCGA TATTCAGACC GCCTGCGAAG CCTTGGGCGC 

951 GTTTAAAAAC GTCAAACGCC GGATGGAAAT CAAAGGCACG GCAAACGGCA 

1001 TCACCGTTTA CGACGACTTC GCCCACCACC CGACCGCCAT CGAAACCACG 

1051 ATTCAAGGTT TGCGCCAACG CGTCGGCGGC GCGCGCATCC TCGCCGTCCT 

1101 CGAACCGCGT TCCAACACGA TGAAGCTGGG CACGATGAAG TCCGCCCTGC 

1151 CTGTAAGCCT CAAAGAAGCC GACCAAGTGT TCTGCTACGC CGGCGGCGTG 

1201 GACTGGGACG TCGCCGAAGC CCTCGCGCCT TTGGGCGGCA GGCTGAACGT 

1251 CGGCAAAGAC TTCGATGCCT TCGTTGCCGA AATCGTGAAA AACGCCGAAG 

1301 TAGGCGACCA TATTTTGGTG ATGAGCAACG GCGGTTTCGG CGGAATACAC 

1351 GGAAAGCTGC TGGAAGCTTT GAGATAG 

This corresponds to the amino acid sequence <SEQ ID 868; ORF132-l>: 

1 MKHIHIIGIG GTFMGGLAAI A KEAGFEVSG CDAKMYPPMS TQLEALGIDV 

51 YEGFDAAQLD EFKADVYVIG NVAKRGMDW EAILNLGLPY ISGPQWLSEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VPENFGVSAR 

151 LPQTPRQDPN SQSPFFVIEA DEYDTAFFDK RSKFVHYRPR TAVLNNLEFD 

201 HADIFADLGA IQTQFHYLVR TVPSEGLIVC NGRQQSLQDT LDKGCWTPVE 

251 KFGTEHGWQA GEANADGSFD VLLDGKTAGR VKWDLMGRHN RMNALAVIAA 

301 ARHVGVDIQT ACEALGAFKN VKRRMEIKGT ANGITVYDDF AHHPTAIETT 
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351 IQGLRQRVGG ARILAVLEPR SNTMKLGTMK SALPVSLKEA DQVFCYAGGV 
401 DWDVAEALAP LGGRLNVGKD FDAFVAEIVK NAEVGDHILV MSNGGFGGIH 
451 GKLLEALR* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with the hypothetical o457 protein of E.coli (accession number U14003) 
ORF132 and o457 show 58% aa identity in 140 aa overlap: 

Orfl32: 4 I H 1 1 G I GGT FMGG LAAI AKE AG FE VS GC D AKMYP PMSTQLE ALG I DVY EG FDAAQL DE FK 63 

IHI+GI GTFMGGLA +A++ G EV+G DA +YPPMST LE GI++ +G+DA+QL+ + 
o457: 3 IHILGICGTFMGGLAMLARQLGHEVTGSDANVYPPMSTLLEKQGIELIQGYDASQLEP-Q 61 



Orfl32: 64 ADVYVIGNVAKRGMDWEAILNLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTASML 123 

D+ +IGN RG VEA+L +PY+SGPQWL + VL WVL VAGTHGKTTTA M 
o457 : 62 PDLVIIGNAMTRGNPCVEAVLEKNIPYMSGPQWLHDFVLRDRWVLAVAGTHGKTTTAGMA 121 

Orfl32: 124 AWVLEYAGLAPGFLIGGVXG 143 

W+LE G PGF+IGGV G 
o457: 122 TWILEQCGYKPGFVIGGVPG 141 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF132 shows 74.6% identity over a 189aa overlap with an ORF (ORF132a) from strain A ofN. 
meningitidis: 



10 20 30 40 50 60 

orf 132 . pep MKH IH 1 1 G I GGT FMGGLAAIAKEAGFEVSGCDAKMYP PMSTQLE ALG I DVYEGFDAAQLD 
I I 1 1 I I I I IE I I I I II : I i I I I I I I I i I I I I I I I I I I I I I I I I I I I I III 111:1111 
orf 132a MKHIHIIGIGGTFMGGIAAIAKEAGFEXSGCDAKMYPPMSTQLEALGIGVYEGFDTAQLD 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 132. pep E FKADVYVI GNVAKRGMDWEAI LNLGLP Y I SGPQWL SEN VLHHHWVLG VAGTHGKTTTA 
I I I I I I I I I I I I I i I I I I I I I I I I I 111111111)1:11 I I I I I III! I I I I I I I I 
' orfl32a EFKADVYVIGNVAICRGMDWEAILNRGLPYISGPQWLAENXLHHHWXLGVAXTHGKTTTA 

70 80 90 100 110 120 



130 140 150 160 

orf 132. pep SMLAWVLEYAGLAPGFLIGGVXGKFR RFRPPAANAAPRPEQPI AVFR 

I I I I I I I I I I II I I I I I I II : I 1:1: I : : I z I I 

orf 132a SMLAWVLEYAGLAPGFXIGGVPENFSVSARL-PQTPRQDPNSQSPFFVIEADEYDTAFFD 

130 140 150 160 170 



170 180 190 200 210 220 

orf 132 . pep HRSRRIRHRLFRQTFXIRALPSAYRRVEQSGIRPRRHLCRLGRDTDPVPLPRAYRAVXRL 
ill: :::| 

orf 132a KRSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHHLVRTVPSEGLIVCNGRQQSLQD 
180 190 200 210 220 230 



The complete length ORF 132a nucleotide sequence <SEQ ID 869> is: 



1 ATGAAACACA TCCACATTAT CGGTATCGGC GGCACGTTTA TGGGTGGGAT 

51 TGCCGCCATT GCCAAAGAAG CAGGGTTTGA ANTCAGCGGT TGCGATGCGA 

101 AGATGTATCC GCCGATGAGC ACCCAGCTCG AAGCCTTGGG CATAGGCGTG 

151 TATGAAGGCT TCGACACCGC GCAGTTGGAC GAATTTAAAG CCGACGTTTA 

201 CGTTATCGGC AATGTCGCCA AGCGCGGGAT GGATGTGGTT GAAGCGATTT 

251 TGAACCGTGG GCTGCCTTAT ATTTCCGGCC CGCAATGGCT GGCTGAAAAC 

301 NTGCTGCACC ATCATTGGNN ACTCGGCGTG GCGGNGACGC ACGGCAAAAC 

351 GACCACCGCG TCTATGCTCG CGTGGGTTTT GGAATATGCC GGACTCGCAC 

401 CGGGCTTCNT TATCGGCGGC GTACCGGAAA ACTTCAGCGT TTCCGCCCGC 

451 CTGCCGCAAA CGCCGCGCCA AGACCCGAAC AGCCAATCGC CGTTTTTCGT 

501 CATTGAAGCC GACGAATACG ACACCGCGTT TTTCGACAAA CGCTCCAAAT 

551 TCGTGCATTA CCGTCCGCGT ACCGCCGTGT TGAACAATCT GGAATTCGAC 

601 CACGCCGACA TCTTCGCCGA TTTGGGCGCG ATACAGACCC AGTTCCACCA 

651 CCTCGTGCGT ACCGTGCCGT CTGAAGGCCT CATCGTCTGC AACGGACGGC 

701 AGCAAAGCCT GCAAGACACT TTGGACAAAG GCTGCTGGAC GCCGGTGGAA 

751 AAATTCGGCA CGGAACACGG CTGGCAGGCC GGCGAAGCCA ATGCCGATGG 
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801 CTCGTTCGAC GTGTTGCTTG ACGGCAAAAA AGCCGGACAC GTCGCTTGGA 

851 GTTTGATGGG CGGACACAAC CGCATGAACG CGCTCGCNGT CATCGCCGCC 

901 GCGCGTCATG CCGGAGTNGA CATTCAGACG GCCTGCGAAG CCTTGAGCAC 

951 GTTTAAAAAC GTCAAACGCC GCATGGAAAT CAAAGGCACG GCAAACGGTA 

1001 TCACCGTTTA CGACGACTTC GCCCACCATC CGACCGCTAT CGAAACCACG 

1051 ATTCAAGGTT TGCGCCAGCG CGTCGGCGGC GCGCGCATCC TCGCCGTCCT 

1101 CGAACCGCGT TCCAATACGA TGAAGCTGGG TACGATGAAA GCCGCCCTGC 

1151 CCGCAAGCCT CAAAGAAGCC GACCAAGTGT TCTGNTACGC CGGCGGCGCG 

1201 GACTGGGACG TTGCCGAAGC CCTCGCGCCT TTGGGCGGCA GGCTGCACGT 

1251 CGGCAAAGAC TTCGATGCCT TCGTTGCCGA AATCGTGAAA AACGCCGAAG 

1301 CAGGCGACCA TATTTTGGTG ATGAGCAACG GCGGTTTCGG CGGAATACAC 

1351 ACCAAACTGC TGGACGCTTT GAGATAG 

This encodes a protein having amino acid sequence <SEQ ID 870>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MKHIHIIGIG GTFMGGIAAI 



YEGFDTAQLD 
XLHHHWXLGV 
LPQTPRQDPN 
HADIFADLGA 
KFGTEHGWQA 
ARHAGVDIQT 
I QGLRQRVGG 
DWDVAEALAP 
TKLLDALR* 



EFKADVYVIG 
AXTHGKTTTA 
SQSPFFVIEA 
IQTQFHHLVR 
GEANADGSFD 
ACEALSTFKN 
ARILAVLEPR 
LGGRLHVGKD 



AKEAGFEXSG 
NVAKRGMDW 
SMLAWVLEYA 
DEYDTAFFDK 
TVPSEGLIVC 
VLLDGKKAGH 
VKRRMEIKGT 
SNTMKLGTMK 
FDAFVAEIVK 



CDAKMYPPMS 
EAILNRGLPY 
GLAPGFXIGG 
RSKFVHYRPR 
NGRQQSLQDT 
VAWSLMGGHN 
ANGITVYDDF 
AALPASLKEA 
NAEAGDHILV 



TQLEALGIGV 
ISGPQWLAEN 
VPENFSVSAR 
TAVLNNLEFD 
LDKGCWTPVE 
RMNALAVIAA 
AHHPTAIETT 
DQVFXYAGGA 
MSNGGFGGIH 



ORF132a and ORF132-1 show 93.9% identity in 458 aa overlap: 



orf 132a. pep MKHIHIIGIGGTFMGGIAAIAKEAGFEXSGCDAKMYPPMSTQLEALGIGVYEGFDTAQLD 
I I I I I I I I I I I I I I I I : I I I 1 I I I I I I I I II I I I I I I 1 I I I I I I I I I IIIIICIIII 
orf 132-1 MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 



orf 132a . pep E FKADVYVI GN VAKRGMDWEAI LNRGLPY I SG PQWLAENXLHHHWXLGVAXTHGKTTT A 
I I I I I I I I 1 I I I I I I I I I I I I I I II I I I I II I I I i I : II I I I I I MM I I II I II I 
orf!32-l EFKADVYVIGNVAKRGMDWEAILNLGLPYI SGPQWLSENVLHHHWVLGVAGTHGKTTTA 



orf 132a . pep SMLAWVLEYAGLAPGFXIGGVPENFSVSARLPQTPRQDPNSQSPFFVIEADEYDTAFFDK 
I I I I I I 1 I I 1 i I I f I i I 1 I 1 } I I 1 : I I I I I K 1 r I 1 I I t J I k I I I I I I 1 I t J I 1 t I i I I I 
orf 132-1 SMLAWVLEYAGLAPGFLIGGVPENFGVSARLPQTPRQDPNSQSPFFVIEADEYDTAFFDK 



orf 132a . pep RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHHLVRTVPSEGLIVCNGRQQSLQDT 
I t I J I I 1 1 I 1 I I I I I I i I I 1 1 I I 1 1 I I I 1 1 t I I I I I = I I I I 1 I I I I I 1 1 I f 1 I ! I I I I I 1 
orf 132-1 RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHYLVRTVPSEGLIVCNGRQQSLQDT 



orf 132a . pep LDKGCWTPVEKFGTEHGWQAGEANADGSFDVLLDGKKAGHVAWSLMGGHNRMNALAVIAA 
II I II I I I I I II II II M II II II I II I I I I II I I I I I : I Mill I II I I II I I M I 
orf 132-1 LDKGCWTPVEKFGTEHGWQAGEANADGSFDVLLDGKTAGRVKWDLMGRHNRMNALAVIAA 



orf 132a . pep ARHAGVD I QT ACE AL S T FKN VKRRME I KGT ANG I TV YD D FAHH PT AI ETT I QGLRQRVGG 
I I I M II II I II I I 1 I I I I II I M I M II I M I 1 I I I I II I I I I I I I M II I I I I I I I 
orf 132-1 ARHVGVD I QT ACE ALGAFKN VKRRME I KGTANG I T VY DD FAHH PT AI ETT I QGLRQRVGG 

orf 132a . pep ARILAVLEPRSNTMKLGTMKAALPASLKEADQVFXYAGGADWDVAEALAPLGGRLHVGKD 
I f I I I ! I I 1 I I I I I I I I I I I : I I 1 : I I 11 I I I t I M I I : I I I I I II M I I I I II M II I 
orf 132-1 ARILAVLEPRSNTMKLGTMKSALPVSLKEADQVFCYAGGVDWDVAEALAPLGGRLNVGKD 



orf 132a . pep FDAFVAE I VKNAEAGDH I LVMSNGG FGG IHTKLLDALRX 
M M M I I II I I I : M I I M M I I I M I M I I 1 : I I M 
orf 132-1 FDAFVAE I VKNAEVGDH I LVMSNGG FGG I HGKLLEALRX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF132 shows 89.6% identity over 259 aa overlap with a predicted ORF (ORF132ng) from N. 
gonorrhoeae: 

orf 132 .pep MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 60 

I I I 1 [ I I I 1 I 1 I I 1 I I i 1 I I 1 I 1 I 1 I = I I I I I 1 I I I I 1 I I E I I L I I I I Mllllllll: 
orf!32ng MKHIHIIGIGGTFMGGIAAIAKEAGFKVSGCDAKMYPPMSTQLEALGIGVHEGFDAAQLE 60 
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orfl32.pep EFKADVWIGNVAKRGMDWEAILNLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTA 120 

I I : I I : I I I I I I I : I I I I I I I I I I I I I I N I I I I I I : I I I 1 I I I I I S I I I I I I 

orfl32ng EFQADIYVIGNVARRGMDVVEAILNRGLPYISGPQWLAENVLHHHWVLGVAGTHGKTTTA 120 

orfl32.pep SMLAWVLEYAGLAPGFLIGGVXGKFRRFRPPAANAAPRPEQPIAVFRHRSRRIRHRLFRQ 180 

I I I I I I I II I I I I I I I I I I I I Mllllllhllll Mil I I 1 I I I I 1 I I I I I I I I I I 
orfl32ng SMLAWVLEYAGLAPGFLIGGVPGKFRRFRPPTANAASRPEQQIAVFRHRSRRIRHRLFRQ 180 

orfl32.pep TFXIRALPSAYRRVEQSGIRPRRHLCRLGRDTDPVPLPRAYRAVXRLNRLQRTAAKPARY 240 

I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I HIM:: I : I I I I I I I I I I I I 
orfl32ng TLQIRALSPAYRRVEQSGIRPRRHLRRLGRDTDPVPPPRAHRTIRRPHRLQRTAAKPARY 240 

orfl32.pep FGQRLLDAGGKIRHGTRLA 259 

I I I I I I I I I I I I I I I I I I 
orfl32ng FGQRLLDAGGKIRHRTRLADW 261 

An ORF132ng nucleotide sequence <SEQ ID 871 > was predicted to encode a protein having amino 
acid sequence <SEQ ID 872>: 



1 MKHIHIIGIG GTFMGGIAAI A KEAGFKVSG CDAKMYPPMS TQLEALGIGV 

51 HEGFDAAQLE EFQADIYVIG NVARRGMDW EAILNRGLPY ISGPQWLAEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VPGKFRRFRP 

151 PTANAASRPE QQIAVFRHRS RRIRHRLFRQ TLQIRALSPA YRRVEQSGIR 

201 PRRHLRRLGR DTDPVPPPRA HRTIRRPHRL QRTAAKPARY FGQRLLDAGG 

251 KIRHRTRLAD W* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 873>: 



1 ATGAAACACA TCCACATTAT CGGTATCGGC GGCACGTTTA TGGGCGGGAT 

51 TGCCGCCATT GCCAAAGAAG CCGGGTTCAA AGTCAGCGGT TGCGACGCGA 

101 AGATGTATCC GCCGATGAGC ACCCAGCTCG AAGCCTTGGG CATAGGCGTA 

151 CACGAAGGCT TCGATGCCGC GCAGTTGGAA GAATTTCAAG CCGATATTTA 

201 CGTCATCGGC AATGTCGCCA GGCGCGGGAT GGATGTGGTC GAGGCGATTT 

251 TGAACCGTGG GCTGCCTTAT ATTTCCGGCC CGCAATGGCT GGCTGAAAac 

301 GTGCtgcacc atcaTTGGgt ACTCGGCGTG GcagggaCGC ACGGcaaAac 

351 gaccaCcGcg tCCATGCTCG CCTGGGTCTT GGAATATGCC GGACTCGCGC 

401 CGGGCTTCCT CATCGGCGGt gtaccggaAA ATTTCGGCGT TTCCGCCCGC 

451 CTACCGCAAA CGCCGCGTCA AGACCCGAAC AGCAAATCGC CGTTTTTCGT 

501 CATCGAAGCC GACGAATACG ACACCGCCTT TTTCGACAAA CGCTCCAAAT 

551 TCGTGCATTA TCGCCCGCGT ACCGCCGTGT TGAACAATCT GGAATTCGAC 

601 CACGCCGACA TCTTCGCCGA CTTGGGCGCG ATACAGACCC AGTTCCACCA 

651 CCTCGTGCGC ACCGTACCAT CCGAAGGCCT CATCGTCTGC AACGGACAGC 

701 AGCAAAGCCT GCAAGATACT TTGGACAAAG GCTGCTGGAC GCCGGTGGAA 

751 AAATTCGGCA CCGGACACGG CTGGCAGATT GGTGAAGTCA ATGCCGACGG 

801 CTCGTTCGAC GTATTGCTTG ACGGCAAAAA AGCCGGACAC GTCGCATGGG 

851 ATTTGATGGG CGGACACAAC CGCATGAACG CGCTCGCCGT CATCGCTGCC 

901 GCACGCCATG CCGGAGTCGA TGTTCAGACG GCCTGCGAAG CCTTGGGTGC 

951 GTTTAAAAAC GTCAAACGCC GCATGGAAAT CAAAGGCACG GCAAACGGCA 

1001 TCACCGTTTA CGACGATTTC GCCCACCACC CGACCGCCAT CGAAACCACG 

1051 ATTCAAGGTT TGCGCCAACG TGTCGGCGGC GCGCGCATCC TCGCCGTCCT 

1101 CGAGCCGCGT TCCAACACCA TGAAACTCGG CACGATGAAG TCCGCCCTGC 

1151 CCGCAAGCCT CAAAGAAGCC GACCAAGTGT TCTGCTACGC CGGCGGCGCG 

1201 GACTGGGACG TTGCCGAAGC CCTCGCGCCT TTGGGCTGCA GGCTGCGCGT 

1251 CGGTAAAGAT TTCGATACCT TCGTTGCCGA AATTGTGAAA AACGCCCGAA 

1301 CCGGCGACCA TATTTTGGTG ATGAGCAACG GCGGTTTCGG CGGAATACAC 

1351 ACCAAACTGC TGGACGCTTT GAGATAG 

This corresponds to the amino acid sequence <SEQ ID 874; ORF132ng-l>: 

1 MKHIHIIGIG GTFMGGIAAI A KEAGFKVSG CDAKMYPPMS TQLEALGIGV 

51 HEGFDAAQLE EFQADIYVIG NVARRGMDW EAILNRGLPY ISGPQWLAEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VPENFGVSAR 

151 LPQTPRQDPN SKSPFFVIEA DEYDTAFFDK RSKFVHYRPR TAVLNNLEFD 

201 HADIFADLGA IQTQFHHLVR TVPSEGLIVC NGQQQSLQDT LDKGCWTPVE 

251 KFGTGHGWQI GEVNADGSFD VLLDGKKAGH VAWDLMGGHN RMNALAVIAA 

301 ARHAGVDVQT ACEALGAFKN VKRRMEIKGT ANGITVYDDF AHHPTAIETT 

351 IQGLRQRVGG ARILAVLEPR SNTMKLGTMK SALPASLKEA DQVFCYAGGA 

401 DWDVAEALAP LGCRLRVGKD FDTFVAEIVK NARTGDHILV MSNGGFGGIH 

451 TKLLDALR* 
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ORF132ng-l and ORF132-1 show 93.2% identity in 458 aa overlap; 

orfl32ng-l.pep MKHIHIIGIGGTFMGGIAAIAKEAGFKVSGCDAKMYPPMSTQLEALGIGVHEGFDAAQLE 
I I I I I I I I I I I I I I I I : I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I : I I I I I i I i : 
orf 132-1 MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 

orf 132ng-l . pep EFQADIYVIGNVARRGMDWEAILNRGLPYISGPQWLAENVLHHHWVLGVAGTHGKTTTA 

I I : I I : I I I I I I I : I I I I i I I I I I i I I I I ! I I I I I I : I I I I I I I I I I I I I I I I I I I I i I * 
orfl32-l E FKADVYVI GNVAKRGMDWEAI LNLGLPY I SGPQWLSENVLHHHWVLGVAGTHGKTTTA 

orf 132ng-l . pep SMLAWVLEYAGLAPGFLIGGVPENFGVSARLPQTPRQDPNSKSPFFVIEADEYDTAFFDK 
I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I i I I I I I I I 1 : I I I I I I I I I I I I I I I M I 
orf 132-1 SMLAWVLEYAGLAPGFLIGGVPENFGVSARLPQTPRQDPNSQSPFFVIEADEYDTAFFDK 

orfl32ng-l.pep RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHHLVRTVPSEGLIVCNGQQQSLQDT 
I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I : I I I I I I I I I I I I I II : I I M I I I 
orf 132-1 RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHYLVRTVPSEGLIVCNGRQQSLQDT 

orf 132ng-l . pep LDKGCWTPVEKFGTGHGWQIGEVNADGSFDVLLDGKKAGHVAWDLMGGHNRMNALAVIAA 
I I I I I I II I I I I I I I I I I 11:1111111111111 11:1 I I I I 1 I I I I I I I I I II I 
orfl32-l LDKGCWT PVEKFGTEHGWQAGEANADGS FDVLLDGKTAGRVKWDLMGRHNRMNALAVI AA 

orf 132ng-l . pep ARHAGVDVQTACEALGAFKNVKRRMEIKGTANGITVYDDFAHHPTAIETTIQGLRQRVGG 
I II : I I I : I I I I II I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I 
orf 132-1 ARHVGVDIQTACEALGAFKNVKRRMEIKGTANGITVYDDFAHHPTAIETTIQGLRQRVGG 

orf 132ng-l . pep ARILAVLEPRSNTMKLGTMKSALPASLKEADQVFCYAGGADWDVAEALAPLGCRLRVGKD 
I I I M I I I I I I I I I I I I I I I I I I I : I I I I I I I I M I I I I : f I II I I I M t I I il I I I I 
orfl32-l ARILAVLE PRSNTMKLGTMKSALPVSLKEADQVFC YAGGVDWDVAEALAPLGGRLN VGKD 

orf 132ng-l .pep FDTFVAE IVKNARTGDHI LVMSNGGFGGIHTKLLDALRX 
I I : I I I I I I I I I : : I I I I I I I I I I I I I I I I I I I : I I I I 
orf 132-1 FDAFVAEIVKNAEVGDHILVMSNGGFGGIHGKLLEALRX 

In addition, ORF132ng-l is homologous to a hypothetical Exoli protein: 

pir||S56459 hypothetical protein o457 - Escherichia coli >gi|537075 (U14003) 
ORF_o457 [Escherichia coli] >gi 1 1790680 (AE000494) hypothetical 48.5 kD protein 
in fbp-pmba intergenic region [Escherichia coli] Length =457 
Score « 474 bits (1207), Expect = e-133 

Identities = 249/439 (56%), Positives = 294/439 (66%), Gaps - 13/439 (2%) 

Query: 22 KEAGFKVSGCDAKMYPPMSTQLEALGIGVHEGFDAAQLEEFQADIYVIGNVARRGMDWE 81 

++ G +V+G DA +YPPMST LE GI + +G+DA+QLE Q D+ +IGN RG VE 
Sbjct: 21 RQLGHEVTGSDANVYPPMSTLLEKQGIELIQGYDASQLEP-QPDLVIIGNAMTRGNPCVE 79 

Query: 82 AILNRGLPYISGPQWLAENVLHHHWVLGVAGTHGKTTTASMLAWVLEYAGLAPGFLIGGV 141 

A+L + +PY+SGPQWL + VL WVL VAGTHGKTTTA M W+LE G PGF+IGGV 
Sbjct: 80 AVLEKNIPYMSGPQWLHDFVLRDRWVLAVAGTHGKTTTAGMATWILEQCGYKPGFVIGGV 139 

Query: 142 PENFGVSARLPQTPRQDPNSKSPFFVIEADEYDTAFFDKRSKFVHYRPRTAVLNNLEFDH 201 

P NF VSA L +S FFVIEADEYD AFFDKRS KFVHY PRT +LNNLEFDH 

Sbjct: 14 0 PGNFEVSAHL GESDFFVIEADEYDCAFFDKRSKFVHYCPRTLILNNLEFDH 190 

Query: 202 ADIFADLGAIQTQFHHLVRTVPSEGLIVCNGQQQSLQDTLDKGCWTPVEKFGTGHGWQIG 261 

ADIF DL AIQ QFHHLVR VP +G 1+ +L+ T+ GCW+ EG WQ 

Sbjct: 191 ADIFDDLKAIQKQFHHLVRIVPGQGRIIWPENDINLKQTMAMGCWSEQELVGEQGHWQAK 250 

Query: 262 EVNADGS-FDVLLDGKKAGHVAWDLMGGHNRMNALAVIAA/^HAGVDVQTACEALGAFKN 320 

++ D S ++VLLDG+K G V W L+G HN N L IAAARH GV A ALG+F N 
Sbjct: 251 KLTTDASEWEVLLDGEKVGEVKWSLVGEHNMHNGLMAIAAT^RHVGVAPADAANALGSFIN 310 

Query: 321 VKRRME IKGT ANG ITVYDD FAHHPTAIETT I QGLRQRVGG-ARI LAVLE PRSNTMKLGTM 37 9 

+RR+E++G ANG+TVYDDFAHHPTAI T+ LR +VGG ARI+AVLEPRSNTMK+G 
Sbjct: 311 ARRRLELRGEANGVTVYDDFAHHPTAILATLAALRGKVGGTARIIAVLEPRSNTMKMGIC 370 

Query: 380 KSALPASLKEADQVF-CYAGGADWDVAEALAPLGCRLRVGKDFDTFVAEIVKNARTGDHI 438 

K L SL AD+VF W VAE D DT +VK A+ GDHI 

Sbjct: 371 KDDLAPSLGRADEVFLLQPAHIPWQVAEVAEACVQPAHWSGDVDTLADMWKTAQPGDHI 430 



Query: 



439 LVMSNGGFGGIHTKLLDAL 457 
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LVMSNGGFGGIH KLLD L 
Sbjct: 431 LVMSNGGFGGIHQKLLDGL 449 

Based on this analysis, it was predicted that these proteins from Kmeningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF132-1 (26.4kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
20A shows the results of affinity purification of the His-fiision protein, and Figure 20B shows the 
results of expression of the GST-fusion in E.coli. Purified His-fusion protein was used to immunise 
mice, whose sera were used for FACS analysis (Figure 20C) and ELISA (positive result). These 
experiments confirm that ORF132 is a surface-exposed protein, and that it is a useful immunogen. 

Example 103 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 875> 

1 . . CCGGGCTATT ACGGCTCGGA TGACGAATTT AAGCGGGCAT TCGGAGAAAA 

51 CTCGCCGACA TmCAAGAAAC ATTGCAACCG GAGCTGCGGG ATTTATGAAC 

101 CCGTATTGAA AAAATACGGC AAAAAGCGCG CCAACAACCA TTCGGTCAGC 

151 ATTAGTGCGG ACTTCGGCGA TTATTTCATG CCGTTCGCCA GCTATTCGCG 

201 CACACACCGT ATGCCCAACA TCCAAGAAAT GTATTTTTCC CAAATCGGCG 

251 ACTCCGGCGT TCACACCGCC TTAAAACCAG AGCGCGCAAA CACTTGGCAA 

301 TTTGGCTTCr ATACCTATAA AAAAGGATTG TTAAAACAAG ATGATACATT 

351 AGGATTAAAA CTGGTCGGCT ACCGCAGCCG CATCGACAAC TACATCCACA 

4 01 ACGTTTACGG GAAATGGTGG GATTTGAACG GGGATATTCC GAGCTGGGTC 

4 51 AGCAGCACCG GGCTTGCCTA CACCATCCAA CATCGCrATT TCAwAGACAA 

501 AGTGCATCAA nnnnnnnnnn nnnnnnnnnn nnnnTACGAT TATGGGCGTT 

551 TTTTCACCAA CCTTTCTTAC GCCTATCAAA AAAGCACGCA ACCGACCAAC 

601 TTCAGCGATG CGAGCGAATC GCCCAACAAT GCGTCCAAAG AAGACCAACT 

651 CAAACAAGGT TATGGGTTGA GCAGGGTTTC CGCCCTGCCG CGAGATTACG 

701 GACGTTTGGA AGTCGGTACG CGCTGGTTGG GCAACAAACT GACTTTGGGC 

751 GGCGCGATGC GCTATTTCGG CAAGAGCATC CGCGCGACGG CTGAAGAACG 

801 CTATATCGAC GGCACCAACG GGGGAAATAC CAGCAATTTC CGGCAACTGG 

851 GCAAGCGTTC CATCAAACAA ACCGAAACTC TTGCCCGCCA GCCTTTGATT 

901 TTwGATTTTa ACGCCGCTTA CGAGCCGAAG AAAAACCTTA TTTTCCGCGC 

951 CGAAGTCAAA AATCTGTTCG ACAGGCGTTA TATCGATCCG CTCGATGCGG 

1001 GCAATGATGC GGCAAC . GAG CGTTATTACA GCTCGTTCGA CCCGAAAGAC 

1051 AAGGACrrAG ACGTAACGTG TAATGCTGAT AAAACGTTGT GCaACGGCAA 

1101 ATACGGCGGC ACAAGCAAAA GCGTATTGAC CAATTTTGCA CGCGGACGCA 

1151 CCTTTTTgAT GACGATGAGC TACAAGTTTT AA 

This corresponds to the amino acid sequence <SEQ ID 876; ORF133>: 

1 . . PGYYGSDDEF KRAFGENSPT XKKHCNRSCG IYEPVLKKYG KKRANNHSVS 

51 ISADFGDYFM PFASYSRTHR MPNIQEMYFS QIGDSGVHTA LKPERANTWQ 

101 FGFXTYKKGL LKQDDTLGLK LVGYRSRIDN YIHNVYGKWW DLNGDIPSWV 

151 SSTGLAYTIQ HRXFXDKVHQ XXXXXXXXYD YGRFFTNLSY AYQKSTQPTN 

201 FSDASESPNN ASKEDQLKQG YGLSRVSALP RDYGRLEVGT RWLGNKLTLG 

251 GAMRYFGKSI RATAEERYID GTNGGNTSNF RQLGKRSIKQ TETLARQPLI 

301 XDFNAAYEPK KNLIFRAEVK NLFDRRYIDP LDAGNDAAXE RYYSSFDPKD 

351 KDXDVTCNAD KTLCNGKYGG TSKSVLTNFA RGRTFLMTMS YKF* 

Further work revealed the further partial DNA sequence <SEQ ID 877>: 

1 GAGGCGCAGA TACAGGTTTT GGAAGATGTG CACGTCAAGG CGAAGCGCGT 

51 ACCGAAAGAC AAAAAAGTGT TTACCGATGC GCGTGCCGTA TCGACCCGTC 

101 AGGATATATT CAAATCCAGC GAAAACCTCG ACAACATCGT ACGCAGCATC 

151 CCCGGTGCGT TTACACAGCA AGATAAAAGC TCGGGCATTG TGTCTTTGAA 

201 TATTCGCGGC GACAGCGGGT TCGGGCGGGT CAATACGATG GTGGACGGCA 

251 TCACGCAGAC CTTTTATTCG ACTTCTACCG ATGCGGGCAG GGCAGGCGGT 
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10 



15 



20 



25 



30 



35 



40 



45 



301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 



TCATCTCAAT 
TGTCGTCAAA 
GTTCGGCGAA 
AATACCTACG 
AGGTAATGCG 
CATCTGTCGG 
TACCGCGTGG 
TTTGGAACGG 
TCAATTCCGA 
AAATACAAGC 
CGAAGAGCAT 
TTACCCCCAT 
TTTAAATTGG 
CGATTTAAAC 
AGTTCAATTA 
GCAGCCTACA 
AGGCTGGGGG 
TCGACCTCAA 
CAAACCACTT 
CTTTCCTGAA 
GGCTTTATTC 
CAAAAATCAA 
CTACTTCGAT 
CCAATACCGT 
TCGGATGACG 
GAAACATTGC 
ACGGCAAAAA 
GGCGATTATT 
CAACATCCAA 
CCGCCTTAAA 
TATAAAAAAG 
CGGCTACCGC 
GGTGGGATTT 
GCCTACACCA 
TTTTGAGTTG 
CTTACGCCTA 
GAATCGCCCA 
GTTGAGCAGG 
GTACGCGCTG 
TTCGGCAAGA 
CAACGGGGGA 
AACAAACCGA 
GCTTACGAGC 
GTTCGACAGG 
CGCAGCGTTA 
ACGTGTAATG 
CAAAAGCGTA 
TGAGCTACAA 



TCGGTGCATC 
GGCAGCTTCA 
TCTGCGGACT 
GCCTGCTGCT 
ATGGCGGCGA 
TGTGCTTTAC 
GCGGCGGCGG 
CGCAAGCAGC 
CAGCGGAAAA 
CGTATAAAAA 
GACAAAAGCT 
CGATCCGTCC 
AATACGACGG 
ACCAAAATCG 
CGGTTTGTCT 
ATTCGGGCAG 
CTTTTAAAGG 
CAACACCGCC 
TGGGCTTCAA 
GAATTGGGGC 
CTATTTGGGG 
CCATTGTCCA 
GCCGCGCTCA 
CGGCTACCGT 
AATTTAAGCG 
AACCGGAGCT 
GCGCGCCAAC 
TCATGCCGTT 
GAAATGTATT 
ACCAGAGCGC 
GATTGTTAAA 
AGCCGCATCG 
GAACGGGGAT 
TCCAACATCG 
GAGCTGAATT 
TCAAAAAAGC 
ACAATGCGTC 
GTTTCCGCCC 
GTTGGGCAAC 
GCATCCGCGC 
AATACCAGCA 
AACTCTTGCC 
CGAAGAAAAA 
CGTTATATCG 
TTACAGCTCG 
CTGATAAAAC 
TTGACCAATT 
GTTTTAA 



TGTCGACAGC 
GCGGCTCGGC 
TTAGGCGTGG 
AAAAGGTCTG 
TAGGTGCGCG 
GGGCACAGCA 
GCAGCACATC 
GATATTTTGT 
TGGGAGCGGG 
TTACAACAAC 
GGCGGGAAAA 
AGCCTGAAGC 
CGTATTCAAT 
GCAGCCGCAA 
TTGAACCCGT 
GCAGAAATAT 
ATTTTGAAAC 
ACCTTCCGGC 
TTATTTCCAC 
TGTTTTTCGA 
CGGTTTAAGG 
ACCGGCCGGC 
AAAAAGACAT 
TTCGGCGGCG 
GGCATTCGGA 
GCGGGATTTA 
AACCATTCGG 
CGCCAGCTAT 
TTTCCCAAAT 
GCAAACACTT 
ACAAGATGAT 
ACAACTACAT 
ATTCCGAGCT 
CAATTTCAAA 
ACGATTATGG 
ACGCAACCGA 
CAAAGAAGAC 
TGCCGCGAGA 
AAACTGACTT 
GACGGCTGAA 
ATTTCCGGCA 
CGCCAGCCTT 
CCTTATTTTC 
ATCCGCTCGA 
TTCGACCCGA 
GTTGTGCAAC 
TTGCACGCGG 



AATTTTATTG 
AGGCATCAAC 
ATGACGTCGT 
ACCGGCACCA 
CAAATGGCTG 
GGCGCAGCGT 
GGAAATTTTG 
ACAAGAGGGT 
ATTTACAAAG 
CAAGAACTAC 
CCTg . CaCCG 
AGCAGTCGGC 
AAATACACGG 
AATCATCAAC 
ATACCAACCT 
CCGAAAGGGT 
CTACAACAAC 
TGCCCCGCGA 
AACGAATACG 
CGGTCCTGAT 
GCGATAAAGG 
AGCCAATATT 
TTACCGCTTA 
AATATACGGG 
GAAAACTCGC 
TGAACCCGTA 
TCAGCATTAG 
TCGCGCACAC 
CGGCGACTCC 
GGCAATTTGG 
ACATTAGGAT 
CCACAACGTT 
GGGTCAGCAG 
GACAAAGTGC 
GCGTTTTTTC 
CCAACTTCAG 
CAACTCAAAC 
TTACGGACGT 
TGGGCGGCGC 
GAACGCTATA 
ACTGGGCAAG 
TGATTTTTGA 
CGCGCCGAAG 
TGCGGGCAAT 
AAGACAAGGA 
GGCAAATACG 
ACGCACCTTT 



CCGGACTGGA 
AGCCTTGCCG 
TCAGGGCAAT 
ATTCAACCAA 
GAAAGCGGAG 
GGCGCAAAAT 
GCGCGGAATA 
GCTTTGAAAT 
GCAACAGTGG 
AaAAATACAT 
CAATACGACA 
AGGCAATCTG 
CGCAATTTCG 
CGCAATTATC 
CAATCTGACC 
CGAAGTTTAC 
GCGAAAATCC 
AACCGAGTTG 
GCAAAAACCG 
CAGGACAACG 
GCTGCTGCCC 
TCAACACGTT 
AACTACAGCA 
CTATTACGGC 
CGACATACAA 
TTGAAAAAAT 
TGCGGACTTC 
ACCGTATGCC 
GGCGTTCACA 
CTTCAATACC 
TAAAACTGGT 
TACGGGAAAT 
CACCGGGCTT 
ACAAACACGG 
ACCAACCTTT 
CGATGCGAGC 
AAGGTTATGG 
TTGGAAGTCG 
GATGCGCTAT 
TCGACGGCAC 
CGTTCCATCA 
TTTTTACGCC 
TCAAAAATCT 
GATGCGGCAA 
CGAAGACGTA 
GCGGCACAAG 
TTGATGACGA 



This corresponds to the amino acid sequence <SEQ ID 878; ORF133-l>: 



50 



55 



60 



65 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



EAQIQVLEDV 
PGAFTQQDKS 
SSQFGASVDS 
NTYGLLLKGL 
YRVGGGGQHI 
KYKPYKNYNN 
FKLEYDGVFN 
AAYNSGRQKY 
QTTLGFNYFH 
QKSTIVQPAG 
SDDEFKRAFG 
GDYFMPFASY 
YKKGLLKQDD 
AYTIQHRNFK 
ESPNNASKED 
FGKSIRATAE 
AYEPKKNLIF 
TCNADKTLCN 



HVKAKRVPKD 
SGIVSLNIRG 
NFIAGLDWK 
TGTNSTKGNA 
GNFGAEYLER 
QELQKYIEEH 
KYTAQFRDLN 
PKGSKFTGWG 
NEYGKNRFPE 
SQYFNTFYFD 
ENSPTYKKHC 
SRTHRMPNIQ 
TLGLKLVGYR 
DKVHKHGFEL 
QLKQGYGLSR 
ERYIDGTNGG 
RAEVKNLFDR 
GKYGGTSKSV 



KKVFTDARAV 
DSGFGRVNTM 
GSFSGSAGIN 
MAAIGARKWL 
RKQRYFVQEG 
DKSWRENLXP 
TKIGSRKIIN 
LLKDFETYNN 
ELGLFFDGPD 
AALKKDIYRL 
NRSCGIYEPV 
EMYFSQIGDS 
SRIDNYIHNV 
ELNYDYGRFF 
VSALPRDYGR 
NTSNFRQLGK 
RYIDPLDAGN 
LTNFARGRTF 



STRQDIFKSS 
VDGITQTFYS 
SLAG SAN LRT 
ESGASVGVLY 
ALKFNSDSGK 
QYDITPIDPS 
RNYQFNYGLS 
AKILDLNNTA 
QDNGLYSYLG 
NYSTNTVGYR 
LKKYGKKRAN 
GVHTALKPER 
YGKWWDLNGD 
TNLSYAYQKS 
LEVGTRWLGN 
RSIKQTETLA 
DAATQRYYSS 
LMTMSYKF* 



ENLDNIVRSI 
TSTDAGRAGG 
LGVDDWQGN 
GHSRRSVAQN 
WERDLQRQQW 
SLKQQSAGNL 
LNPYTNLNLT 
TFRLPRETEL 
RFKGDKGLLP 
FGGEYTGiTG 
NHSVSISADF 
ANTWQFGFNT 
IPSWVSSTGL 
TQPTNFSDAS 
KLTLGGAMRY 
RQPLIFDFYA 
FDPKDKDEDV 



Computer analysis of this amino acid sequence gave the following results: 
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Homology with with the probable TonB-dependent receptor H3121 of H.influenzae ("accession number U32801) 
ORF133 and HI121 show 57% aa identity in 363aa overlap: 

IYEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTA 90 
I EP+L K G K+A NHS ++SA+ DYFMPF +YSRTHRMPNIQEM+FSQ+ ++GV+TA 



LKPERANTWQFGFXTYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWV 150 
LKPE+++T+Q GF TYKKGL QDD LG+KLVGYRS I NYIHNVYG WW +P+W 



10 



15 



20 



25 



30 



Orfl33: 


31 


HI121: 


563 


Orfl33: 


91 


HI121: 


623 


Orfl33: 


151 


HI121: 


681 


Orfl33: 


211 


HI121: 


741 


Orfl33: 


271 


HI121: 


801 


Orfl33: 


331 


HI121: 


860 


Orfl33: 


391 


HI121: 


911 



S G YTI H+ + 



YD GRFF N+SYAYQ++ QPTN++DAS PNN 



AS+ED LKQGYGLSRVS LP+DYGRLE+GTRW KLTLG A RY+GKS RAT EE YI+ 



G+ 



R+ 



++K+TE + +QP+I D + +YEP K+LI +AEV+NL D+RY+DP 



LDAGNDAA +RYYSS 



YKF 



+ + C D + C 
-NNSIECAQDSSAC- 



GG+ K+VL NFARGRT++++++ 
— GGSDKTVLYNFARGRTYILSLN 910 



35 



40 



45 



50 



Homology with a predicted ORF from N.meninzitidis (strain A) 

ORF133 shows 90.8% identity over a 392aa overlap with an ORF (ORF133a) from strain A of K 
meningitidis: 

10 20 30 

orf 133 . pep PGYYGSDDE FKRAFGENS PTXKKHCNRSCGI 

I I I I I I I I I I I I I I I I i I 1111:1111 
orf 133a FYFDAALKKDIYRLNYSTNTVGYRFGGXYTGYYXSDDEFKRAFGENSPTYXKHCNQSCGI 
450 460 470 480 490 500 

40 50 60 70 80 90 

orf 133. pep YEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTAL 
I I II M I I ! I I I I I I I I I I I I I I II I ! I I I I I I I I I I I I II I I M I I I I I I Ml I I I I I I 
orf 133a YEPVLKKYGKKRANNHSVS I S ADFGDYFMPFAS YSRTHRMPN IQEMYFSQIGDSGVHTAL 

510 520 530 540 550 560 

100 110 120 130 140 150 

orf 133. pep KPERANTWQFGFXTYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVS 
MIIIMIIIII lllllllllli I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I 
orf 133a KPERANTWQFGFNT YKKGLLKQDD I LGLKLVG YRSRI DXYI HN VYGKWWDLNGN I PS WVS 

570 580 590 600 610 620 



160 170 180 190 200 210 

orf 133 . pep STGLAYTIQHRXFXDKVHQXXXXXXXXYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNA 
I I I I I I II I II I Mil: III I I I I I I I I I I I I II I I I I I I I I I I I I I I I 

55 orf 133a STGLAYTIQHRNFKDKVHKHGFELELNYDYXRFFTNLSYAYQKSTQPTNFSDASESPNNA 

630 640 650 660 670 680 



220 230 240 250 260 270 

orf 133 . pep SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDG 
60 * | | | | | | [ | | | 1 | M | | I II I I I I I I I I I I I II I II I I I I I I I I II I I I I I I I I 1 I I I I I 

orf 133a SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDX 
690 700 710 720 730 740 

280 290 300 310 320 330 

65 ' orf 133. pep TNGGNTSNFRQLGKRSIKQTETLARQPLIXDFNAAYEPKKNLIFRAEVKNLFDRRYIDPL 
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1 I 1 I I I I I I I I II I I I I I I I I I I I I I I I II II I I I I I I II I I I I I I I I I I II I 
orfl33a TNGXXTSN FRQLGKRS IXQTETLARQPLI FDXYAAYEPKKXLI FRAEVKNLFDRRYI DPL 

750 760 770 780 790 800 

340 350 360 370 380 390 

or f 133 . pep DAGNDAAXERYYSSFDPKDKDXDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSY 
I I I I M I :: 1 I I I I I I I I I I I :MII I : I I I I I I I I I I I I I I I I I I I I I 111:1111 
orfl33a DAGNDAATQRYYSSFDPKDKDEEVTCNDDNTLCNGKYGGTSKSVLTNFARGXTFLITMSY 
810 820 830 840 850 860 



orfl33.pep KFX 
I I I 

orfl33a KFX 
870 

A partial ORF133a nucleotide sequence <SEQ ID 879> is: 

1 AAAGACAAAA AAGTGTTTAC CGATGCGCGT GCCGTATCGA CCCGTCAGGA 

51 TATATTCAAA TCCANCGAAA ACCTCGACAA CATCGTACGC ANCATCCCCG 

101 GTGCGTTTAC ACANCAANAT AAAAGCTCGG GCNTTGTGTC TTTGAATATT 

151 CGCNGCGACA GCGGGTTCGG GCGGGTCAAT ACNATGGTNG ACGGCATCAC 

201 NCANACCTTT TATTCGACTT CTACCGATGC GGGCAGGGCA GGCGGTTCAT 

251 CTCAATTCGG TGCATCTGTC GACAGCAATT TTATNGCCGG ACTGGATGTC 

301 GTCAAAGGCA GCTTCAGCGG CTCGGCAGGC ATCAACAGCC TTGCCGGTTC 

351 GGCGAATCTG CGGACTTTAN GCGTGGATGA TGTCGTTCAG GGCAATANTA 

401 CNTACGGCCT GCTGCTAAAA GGTCTGACCG GCACCAATTC AACCAAAGGT 

451 AATGCGATGG CGGCGATAGG TGCGCGCAAA TGGCTGGAAA GCGGAGCATC 

501 TGTCGGTGTG CTTTACGGGC ACAGCAGGCG CAGCGTGGCG CAAAATTACC 

551 GCGTGGGCGG CGGCGGGCAG CACATCGGAA ATTTTGGCGC GGAATATCTG 

601 GAACGACGCA AGCAACGATA TTTTGAGCAA GAAGGCGGGT TGAAATTCAA 

651 TTCCAACAGC GGAAAATGGG AGCGGGATTT CCAAAAGTCG TACTGGAAAA 

,701 CCAAGTGGTA TCAAAAATAC GATGCCCCCC AAGAACTGCA AAAATACATC 

751 GAAGGTCATG ATAAAAGCTG GCGGGAAAAC CTGGCGCCGC AATACGACAT 

801 CACCCCCATC GATCCGTCCA GCCTGAAGCN GCAGTCGGCA GGCAACCTGT 

851 TTAAATTGGA ATACGACGGC GTATTCAATA AATACACGGC GCAATTTCGC 

901 GATTTAAACA CCAAAATCGG CAGCCGCAAA ATCATCAACC GCAATTATCA 

951 ATTCAATTAC GGTTTGTCTT TGAACCCGTA TACCAACCTC AATCTGACCG 

1001 CAGCCTACAA TTCGGGCAGG CAGAAATATC CGAAAGGGTC GAAGTTTACA 

1051 GGCTGGGGGC TTTTNAAAGA TTTTGAAACC TACAACAACG CAAAAATCCT 

1101 CGACCTCANC AACACCTCCA CCTTCCGGCT GCCCCGTGAA ACCGAGTTGC 

1151 AAACCACTTT GGGCTTCAAT TATTTCCACA ACGAATACGG CAAAAACCGC 

1201 TTTCCTGAAG AATTGGGGCT GTTTTTCGAC GGTCCGGATC ANGACAACGG 

1251 GCTTTATTCC TATTTGGGGC GGTTTAAGGG CGATAAAGGG CTGCTGCCCC 

1301 AAAAATCAAC CATTGTCCAA CCGGCCGGCA GCCAATATTT CAACACGTTC 

1351 TACTTCGATG CCGCGCTCAA AAAAGACATT TACCGCTTAA ACTACAGCAC 

1401 CAATACCGTC GGCTACCGTT TCGGCGGCNA ATATACGGGC TATTACNGCT 

1451 CGGATGACGA ATTTAAGCGG GCATTCGGAG AAAACTCGCC GACATACANG 

1501 AAACATTGCA ACCAGAGCTG CGGAATTTAT GAACCCGTAT TGAAAAAATA 

1551 CGGCAAAAAG CGCGCCAACA ACCATTCGGT CAGCATTAGT GCGGACTTCG 

1601 GCGATTATTT CATGCCGTTC GCCAGCTATT CGCGCACACA CCGTATGCCC 

1651 AACATCCAAG AAATGTATTT TTCCCAAATC GGCGACTCCG GCGTTCACAC 

1701 CGCCTTAAAA CCAGAGCGCG CAAACACTTG GCAATTTGGC TTCAATACCT 

1751 ATAAAAAAGG ATTGTTAAAA CAAGATGATA TATTAGGATT AAAACTGGTC 

1801 GGCTACCGCA GCCGCATCGA CNACTACATC CACAACGTTT ACGGGAAATG 

1851 GTGGGATTTG AACGGGAATA TTCCGAGCTG GGTCAGCAGC ACCGGGCTTG 

1901 CCTACACCAT CCAACACCGC AATTTCAAAG ACAAAGTGCA CAAACACGGT 

1951 TTTGAGTTGG AGCTGAATTA CGATTATNGG CGTTTTTTCA CCAACCTTTC 

2001 TTACGCCTAT CAAAAAAGCA CGCAACCGAC CAACTTCAGC GATGCGAGCG 

2051 AATCGCCCAA CAATGCGTCC AAAGAAGACC AACTCAAACA AGGTTATGGG 

2101 TTGAGCAGGG TTTCCGCCCT GCCGCGAGAT TACGGACGTT TGGAAGTCGG 

2151 TACGCGCTGG TTGGGCAACA AACTGACTTT GGGCGGCGCG ATGCGCTATT 

2201 TCGGCAAGAG CATCCGCGCG ACGGCTGAAG AACGCTATAT CGACGNCACC 

2251 AATGGGGNAN NTACCAGCAA TTTCCGGCAA CTGGGCAAGC GTTCCATCAN 

2301 ACAAACCGAA ACCCTTGCCC GCCAGCCTTT GATTTTTGAT TTNTACGCCG 

2351 CTTACGAGCC GAAGAAAAAN CTTATTTTCC GCGCCGAAGT CAAAAATCTG 

2401 TTCGACAGGC GTTATATCGA TCCGCTCGAT GCGGGCAATG ATGCGGCAAC 

2451 GCAGCGTTAT TACAGTTCGT TCGACCCGAA AGACAAGGAC GAAGAAGTAA 

2501 CGTGTAATGA TGATAACACG TTATGCAACG GCAAATACGG CGGCACAAGC 

2551 AAAAGCGTAT TGACCAATTT TGCACGCGGA CNCACCTTTT TGATAACGAT 

2601 GAGCTACAAG TTTTAA 
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This encodes a protein having (partial) amino acid sequence <SEQ ID 880>: 



10 



15 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



KDKKVHTDAR 
RXDSGFGRVN 
VKGSFSGSAG 
NAMAAIGARK 
ERRKQRYFEQ 
EGHDKSWREN 
DLNTKIGSRK 
GWGLXKDFET 
FPEELGLFFD 
YFDAALKKDI 
KHCNQSCGIY 
NIQEMYFSQI 
GYRSRIDXYI 
FELELNYDYX 
LSRVSALPRD 
NGXXTSNFRQ 
FDRRYIDPLD 
KSVLTNFARG 



AVSTRQDIFK 
TMVDGITXTF 
INSLAGSANL 
WLESGASVGV 
EGGLKFNSNS 
LAPQYDITPI 
I INRNYQFNY 
YNNAKILDLX 
GPDXDNGLYS 
YRLNYSTNTV 
EPVLKKYGKK 
GDSGVHTALK 
HNVYGKWWDL 
RFFTNLSYAY 
YGRLEVGTRW 
LGKRSIXQTE 
AGNDAATQRY 
XTFLITMSYK 



SXENLDNIVR 
YSTSTDAGRA 
RTLXVDDWQ 
LYGHSRRSVA 
GKWERDFQKS 
DPSSLKXQSA 
GLSLNPYTNL 
NTSTFRLPRE 
YLGRFKGDKG 
GYRFGGXYTG 
RANNHSVSIS 
PERANTWQFG 
NGNIPSWVSS 
QKSTQPTNFS 
LGNKLTLGGA 
TLARQPLIFD 
YSSFDPKDKD 
F* 



XIPGAFTXQX 
GGSSQFGASV 
GNXTYGLLLK 
QNYRVGGGGQ 
YWKTKWYQKY 
GNLFKLEYDG 
NLTAAYNSGR 
TELQTTLGFN 
LLPQKSTIVQ 
YYXSDDEFKR 
ADFGDYFMPF 
FNTYKKGLLK 
TGLAYTIQHR 
DASESPNNAS 
MRYFGKSIRA 
XYAAYEPKKX 
EEVTCNDDNT 



KSSGXVSLNI 
DSNFXAGLDV 
GLTGTNSTKG 
HIGNFGAEYL 
DAPQELQKYI 
VFNKYTAQFR 
QKYPKGSKFT 
YFHNEYGKNR 
PAGSQYFNTF 
AFGENSPTYX 
ASYSRTHRMP 
QDDILGLKLV 
NFKDKVHKHG 
KEDQLKQGYG 
TAEERYIDXT 
LIFRAEVKNL 
LCNGKYGGTS 



20 ORF133a and ORF133-1 show 94.3% identity in 871 aa overlap: 



10 20 30 40 

or f 1 3 3 a . pep KDKKVFT DARAVSTRQDI FKSXENLDNI VRX I PGAFTXQXKS 

I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I II 
orf 133-1 EAQIQVLEDVHVKAKRVPKDKKV FT DARAVSTRQDI FKSSENLDNIVRSIPGAFTQQDKS 

25 10 20 30 40 50 60 



30 



50 60 70 80 90 100 

orf 133a. pep SGXVSLNIRXDSGFGRVNTMVDGITXTFYSTSTDAGRAGGSSQFGASVDSNFXAGLDWK 
II llllll II I I I I I I 1 I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I ,ni 
orf 133-1 SGIVSLNIRGDSGFGRVNTMVDGITQTFYSTSTDAGRAGGSSQFGASVDSNFIAGLDWK 

70 80 90 100 110 120 



35 



110 120 130 140 150 160 

orf 133a. pep GSFSGSAGINSLAGSANLRTLXVDDWQGNXTYGLLLKGLTGTNSTKGNAMAAIGARKWL 
I I I I I I I I I I I I I I I I I I I I I I I I I 1 (I I I 11 I I I I I I I I I I I I I I 1 I I I I I I I I I I I 
orf 133-1 GSFSGSAGINSLAGSANLRTLGVDDWQGNNTYGLLLKGLTGTNSTKGNAMAAIGARKWL 

130 140 150 160 170 180 



40 



170 180 190 200 210 220 

orf 133a . pep ESGASVGVLYGHSRRSVAQNYRVGGGGQHIGNFGAEYLERRKQRYFEQEGGLKFNSNSGK 
I I I I I I 1 I I I I I I 1 I I I I I I I I I I I I 1 I I I I I I I I I I I I I II 1 I I I I I I : I I I I I : I I I 
orf 133-1 ESGASVGVLYGHSRRSVAQNYRVGGGGQHIGNFGAEYLERRKQRYFVQEGALKFNSDSGK 

190 200 210 220 230 240 



45 230 240 250 260 270 280 

orf 133a . pep WERDFQKSYWKTKWYQKYDAPQELQKYIEGHDKSWRENLAPQYDITPIDPSSLKXQSAGN 
I I I I : I : : II I I:: I : I I 1 1 I I I I I I I I 1 I I I I I I I I I I I I 1 I I I I I I I I I I 
orf 133-1 WERDLQRQQWKYKPYKNYNN-QELQKYIEEHDKSWRENLXPQYDITPIDPSSLKQQSAGN 

250 260 270 280 290 

50 

290 300 310 320 330 340 

orf 133a. pep LFKLEYDGVFNKYTAQFRDLNTKIGSRKIINRNYQFNYGLSLNPYTNLNLTAAYNSGRQK 
1 I I I I I I I I I I I 1 I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 133-1 LFKLEYDGVFNKYTAQFRDLNTKIGSRKIINRNYQFNYGLSLNPYTNLNLTAAYNSGRQK 
55 300 310 320 330 340 350 

350 360 370 380 390 400 

orf 133a . pep YPKGSKFTGWGLXKDFETYNNAKILDLXNTSTFRLPRETELQTTLGFNYFHNEYGKNRFP 
IIMIIIIIIIt I I I I I I I I I I I I 1 I I I : I I 1 I I 11 I M I I I 1 I I M I I I I I I I I M I 
60 orf 13 3-1 YPKGSKFTGWGLLKDFETYNNAKILDLNNTATFRLPRETELQTTLGFNYFHNEYGKNRFP 

360 370 380 390 400 410 



410 420 430 440 450 460 

orf 133a . pep EELGLFFDGPDXDNGLYSYLGRFKGDKGLLPQKSTIVQPAGSQYFNTFYFDAALKKDIYR 

65 * I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I 

orf 133-1 EELGLFFDGPDQDNGLYS YLGRFKGDKGLLPQKST I VQPAGSQYFNT FYFDAALKKDI YR 

420 430 440 450 460 470 
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470 480 490 500 510 520 

orf 133a . pep LNYSTNTVGYRFGGXYTGYYXSDDEFKRAFGENSPTYXKHCNQSCGIYEPVLKKYGKKRA 
I I I I I I I I I I I I I I I 1 I I I I I ( I II I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I 
orf 133-1 LNYSTNTVGYRFGGEYTGYYGSDDEFKRAFGENSPTYKKHCNRSCGIYEPVLKKYGKKRA 
480 490 500 510 520 530 

530 540 550 560 570 580 

orf 133a . pep NNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTALKPERANTWQFGFN 
I | II I I I I I I I I I I I I I I I I I I 1 II II I I II I I I I I I I I I I I I I I I I I I I I M I I I I I I I 
orf 133-1 NNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTALKPERANTWQFGFN 
540 550 560 570 580 590 

590 600 610 620 630 640 

orf 133a . pep TYKKGLLKQDDILGLKLVGYRSRIDXYIHNVYGKWWDLNGNIPSWVSSTGLAYTIQHRNF 

I I I I ! I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I M : I I I I I I I I I I I I I I I I I I I 
orf 133-1 TYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVSSTGLAYTIQHRNF 
600 610 620 630 640 650 

650 660 670 680 690 700 

orf 133a . pep KDKVHKHGFELELNYDYXRFFTNLSYAYQKSTQPTNFSDASESPNNASKEDQLKQGYGLS 
I I I I I I I I I I I I I I I |J I I I I I I I I I I I I I 1 I I I I I I II I I I I I I 1 I I I I I I I I I I I I I 
orf 133-1 KDKVHKHGFELELNYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNASKEDQLKQGYGLS 
660 670 680 690 700 710 

710 720 730 740 750 760 

orf 133a . pep RVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDXTNGXXTSNFRQLG 
I I I I I 1 I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I Ml I I I I I I I I 
orf 133-1 RVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDGTNGGNTSNFRQLG 
720 730 740 750 760 770 

770 780 790 800 810 820 

orf 133a . pep KRSIXQTETLARQPLIFDXYAAYEPKKXLIFRAEVKNLFDRRYIDPLDAGNDAATQRYYS 
I I I I I I I II I I II I I I I I I I I I I I I I I I I 1 I I I I I I II I I I I I I 1 I I I I I I I I I I I I 
orf 133-1 KR S I KQTETLARQPL I FDFYAAYEPKKNLI FRAEVKNLFDRRYI D PL DAGNDAATQRYYS 

780 790 800 810 820 830 

830 840 850 860 870 

orf 133a . pep SFDPKDKDEEVTCNDDNTLCNGKYGGTSKSVLTNFARGXTFLITMSYKFX 

I I 1 I I I I 1 1 x 1 | I t I : I I I I I I I I I I I I i I I I I I I I I llhlllllll 
orf 133-1 SFDPKDKDEDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSYKFX 
840 850 860 870 880 

Homology with a predicted ORF from N. gonorrhoeae 

ORF133 shows 92.3% identity over 392 aa overlap with a predicted ORF (ORF133ng) from N. 



gonorrhoeae: 

orf 133 .pep 
orfl33ng 
orf 133 .pep 
orf 133ng 
orf 133 .pep 
orfl33ng 
orf 133 .pep 
orfl33ng 
orf 133 .pep 
orf!33ng 
orf 133. pep 
orf 133ng 



PG YYGS DDE FKRAFGENS PTXKKHCNRS CG I 31 

I I I I I :: I I I I I I I I I I I : I : I I : III: 

FYFDAALKKDIYRLNYSTNAINYRFGGEYTGYYGSENEFKRAFGENSPAYKEHCDPSCGL 560 

YEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTAL 91 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I 

YEPVLKKYGKKRANNHSVSISADFGDYFMPFAGYSRTHRMPNIQEMYFSQIGDSGVHTAL 620 

KPERANTWQFGFXTYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVS 151 

I II 111 I I MM III Mil MM II II III I II II MM I'll II II III III I MM: 

KPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVG 680 

STGLAYTIQHRXFXDKVHQXXXXXXXXYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNA 211 

11111111:11 I MM: I M II I I I I M M I I M I II II I I II I II II II 

STGLAYTIRHRNFKDKVHKHGFELELNYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNA 740 

SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDG 271 

M I I II II M II II I I II II I I II I II II II I I I I I M I I II II I II M II I I I I I II II 

S KEDQLKQG YG L S RV S ALPR D YGRLE VGTRWLGNKLT LGGAMRY FGKS I RATAE ERY I DG 800 

TNGGNTSNFRQLGKRSIKQTETLARQPLIXDFNAAYEPKKNLIFRAEVKNLFDRRYIDPL 331 

IIMIMI II I I II 11 I I II M II M II M M II 11 I II II I II M M I 1 11 

TNGGNTSNVRQLGKRSIKQTETLARQPLI FDFYAAYEPKKNLI FRAEVKNLFDRRYI DPL 860 



WO 99/24578 



-479- 



PCT/IB98/01665 



orfl33.pep 
orfl33ng 



DAGNDAAXERYYSSFDPKDKDXDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSY 391 

I I I I I I I :: M f I I I I I I I I I I I I I I I I I I I I I i 1 I I I I 11 I I 1 I t I I I i I I I I I I I I I 
DAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSY 920 



KF 393 
I I 

KF 922 



orf 133. pep 
orf 133ng 

The complete length ORF133ng nucleotide sequence <SEQ ID 881> is predicted to encode a 
protein having amino acid sequence <SEQ ID 882>: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 



MRSSFRLKPI 
PKDKKVFTDA 
IRGDSGFGRV 
VVKGSFSGSA 
GNAMAAIGAR 
LERRKQQYFV 
IEEHDKSWRE 
RDLNTRIGSR 
TGWGLLKDFE 
RFPEELGLFF 
FYFDAALKKD 
KEHCDPSCGL 
PNIQEMYFSQ 
VGYRSRIDNY 
GFELELNYDY 
GLSRVSALPR 
TNGGNTSNVR 
LFDRRYIDPL 
SKSVLTNFAR 



CFYLMGVMLY 
RAVSTRQDVF 
NTMVDGITQT 
GINSLAGSAN 
KWLESGASVG 
QEGGLKFNAG 
NLAPQYDITP 
KIINRNYQFN 
TYNNAKILDL 
DGPDQDNGLY 
IYRLNYSTNA 
YEPVLKKYGK 
IGDSGVHTAL 
IHNVYGKWWD 
GRFFTNLSYA 
DYGRLEVGTR 
QLGKRSIKQT 
DAGNDAATQR 
GRTFLMTMSY 



HHSYAEDAGR 
KSGENLDNIV 
FYSTSTDAGR 
LRTLGVDDW 
VLYGHSRRGV 
SGKWERDLQR 
IDPSGLKQQS 
YGLSLNPYTN 
NNTATFRLPR 
SYLGRFKGDK 
INYRFGGEYT 
KRANNHSVSI 
KPERANTWQF 
LNGDIPSWVG 
YQKSTQPTNF 
WLGNKLTLGG 



AGSEAQIQVL 
RSIPGAFTQQ 
AGGSSQFGAS 
QGNNTYGLLL 
AQNYRVGGGG 
QYWKTKWYKK 
AGNLLNLEYD 
LNLTAAYNSG 
ETELQTTLGF 
GLLPQKSTIV 
GYYGSENEFK 
SADFGDYFMP 
GFNTYKKGLL 
STGLAYTIRH 
SDASESPNNA 
AMRYFGKSIR 



ETLARQPLIF 
YYSSFDPKDK 
KF* 



DFYAAYEPKK 
DEDVTCNADK 



EDVHVKAKRV 
DKSSGIVSLN 
VDSNFIAGLD 
KGLTGTNSTK 
QHIGNFGEEY 
YEDPQELQKY 
GVFNKYTAQF 
RQKYPKGAKF 
NYFHNEYGKN 
QPAGSQYFNT 
RAFGENSPAY 
FAGYSRTHRM 
KQDDILGLKL 
RNFKDKVHKH 
SKEDQLKQGY 
ATAEERYIDG 
NLIFRAEVKN 
TLCNGKYGGT 



A variant was also identified, being encoded by the gonococcal DNA sequence <SEQ ID 883>: 



1 ATGAGATCTT CTTTCCGGTT 

51 TATGCTATAT CATCATAGTT 

101 AGGCGCAGAT ACAGGTTTTG 

151 CCGAAAGACA AAAAAGTGTT 

201 gGATGTGTTC AAATCCGGCG 

251 CCGGTGCGTT TACACAGCAA 

301 ATTCGCGGCG ACAGCGGGTT 

351 CACGCAGACC TTTTATTCGA 

401 CATCTCAATT CGGTGCATCT 

451 GTCGTCAAAG GCAGCTTCAG 

501 TTCGGCGAAT CTGCGGACTT 

551 ATACCTACGG CCTGCTGCTA 

601 GGTAATGCGA TGGCGGCGAT 

651 GTCTGTCGGT GTGCTTTACG 

701 ACCGCGTGGG CGGCGGCGGG 

751 CTGGAACGGC GCAAACAGCA 

801 CAATGCCGGC AGCGGAAAAT 

851 AAACAAAGTG GTATAAAAAA 

901 ATCGAAGAGC ATGATAAAAG 

951 CATCACCCCC ATCGATCCGT 

1001 TGTTTAAATT GGAATACGAC 

1051 CGCGATTTAA ACACCAGAAT 

1101 TCAATTCAAT TACGGTTTGT 

1151 CCGCAGCCTA CAATTCGGGC 

1201 ACAGGCTGGG GGCTTTTAAA 

1251 CCTCGACCTC AACAACACCG 

1301 TGCAAACCAC TTTGGGCTTC 

1351 CGCTTTCCTG AAGAATTGGG 

1401 CGGGCTTTAT TCCTATTTGG 

1451 CTCAAAAATC AACCATTGTC 

1501 TTCTACTTCG ATGCCGCGCT 

1551 CACCAATGCA ATCAACTACC 

1601 GCTCGGAAAA CGAATTTAAG 

1651 AAGGAACATT GCGACCCGAG 

1701 ATACGGCAAA AAGCGCGCCA 

1751 TCGGCGATTA TTTCATGCCG 



GAAGCCGATT TGTTTTTATC TTATGGGTGT 
ATGCCGAAGA TGCAGGGCGC GCGGGCAGCG 
GAAGATGTGC ACGTCAAGGC GAAGCGCGTA 
TACCGATGCG CGTGCCGTAT CGACCCGTca 
AAAACCTCGA CAACATCGTA CGCAGCATAC 
GATAAAAGCT CGGGCATTGT GTCTTTGAAT 
CGGGCGGGTC AATACGATGG TGGACGGCAT 
CTTCTACCGA TGCGGGCAGG GCAGGCGGTT 
GTCGACAGCA ATTTTATTGC CGGACTGGAT 
CGGCTCGGCA GGCATCAACA GCCTTGCCGG 
TAGGCGTGGA TGACGTCGTT CAGGGCAATA 
AAAGGTCTGA CCGGCACCAA TTCAACCAAA 
AGGTGCGCGC AAATGGCTGG AAAGCGGAGC 
GGCACAGCAG GCGCGGCGTG GCGCAAAATT 
CAGCACATCG GAAATTTTGG TGAAGAATAT 
ATATTTTGTA CAAGAGGGTG GTTTGAAATT 
GGGAACGGGA TTTGCAAAGG CAATACTGGA 
TACGAAGACC CCCAAGAACT GCAAAAATAC 
CTGGCGGGAA AACCTGGCGC CGCAATACGA 
CCGGCCTGAA GCAGCAGTCG GCAGGCAATC 
GGCGTATTCA ATAAATACAC GGCGCAATTT 
CGGCAGCCGC AAAATCATCA ACCGCAATTA 
CTTTGAACCC GTATACCAAC CTCAATCTGA 
AGGCAGAAAT ATCCGAAAGG GGCGAAGTTT 
AGATTTTGAA ACCTACAACA ACGCGAAAAT 
CCACCTTCCG GCTGCCCCGC GAAACCGAGT 
AATTATTTCC ACAACGAATA CGGCAAAAAC 
GCTGTTTTTC GACGGTCCTG ATCAGGACAA 
GGCGGTTTAA GGGCGATAAA GGGCTGTTGC 
CAACCGGCCG GCAGCCAATA TTTCAACACG 
CAAAAAAGAC ATTTACCGCT TAAACTACAG 
GTTTCGGCGG CGAATATACG GGCTATTACG 
CGGGCATTCG GAGAAAACTC GCCGGCATAC 
CTGCGGGCTT TATGAACCCG TATTGAAAAA 
ACAACCATTC GGTCAGCATT AGTGCGGACT 
TTCGCCGGCT ATTCGCGCAC ACACCGTATG 
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1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 



CCCAACATCC 
CACCGCCTTA 
CCTATAAAAA 
GTCGGCTACC 
ATGGTGGGAT 
TTGCCTACAC 
GGTTTTGAGC 
TTCTTACGCC 
GCGAATCGCC 
GGGCTGAGCA 
CGGTACGCGC 
ATTTCGGCAA 
ACCAACGGGG 
CAAACAAACC 
CCGCTTACGA 
CTGTTCGACA 
AACGCAGCGT 
TAACGTGTAA 
AGCAAAAGCG 
GATGAGCTAC 



AAGAAATGTA 
AAACCAGAGC 
AGGATTGTTA 
GCAGCCGCAT 
TTGAACGGGG 
CATCCGACAC 
TGGAGCTGAA 
TATCAAAAAA 
CAACAATGCC 
GGGTTTCCGC 
TGGTTGGGCA 
GAGCATCCGC 
GAAATACCAG 
GAAACCCTTG 
GCCGAAGAAA 
GGCGTTATAT 
TATTACAGCT 
TGCTGATAAA 
TATTGACCAA 
AAGTTTTAA 



TTTTTCCCAA 
GCGCAAACAC 
AAACAAGATG 
TGACAACTAC 
ATATTCCGAG 
CGCAATTTCA 
TTACGATTAT 
GCACGCAACC 
tccaaAGAAG 
CCTGCCGCGA 
ACAAACTGAC 
GCGACGGCTG 
CAATGTCCGG 
CCCGACAGCC 
AACCTTATTT 
CGATCCGCTC 
CGTTCGACCC 
ACGTTGTGCA 
TTTCGCACGC 



ATCGGCGACT 
TTGGCAATTT 
ATATATTAGG 
ATCCACAACG 
CTGGGTCGGC 
AAGACAAAGT 
GGGCGTTTTT 
GACCAATTTC 
ACCAACTCAA 
GATTACGGAC 
TTTGGGCGGC 
AAGAACGCTA 
CAACTGGGCA 
TTTGATTTTT 
TCCGCGCCGA 
GATGCGGGCA 
GAAAGACAAG 
ACGGCAAATA 
GGACGCACCT 



CCGGCGTTCA 
GGCTTCAATA 
ATTGAAACTG 
TTTACGGGAA 
AGCACCGGGC 
GCACAAACAC 
TCACCAACCT 
AGCGATGCGA 
ACAAGGTTAT 
GTTTGGAAGT 
GCGAtgcGCT 
TATCGACGGC 
AGCGTTCCAT 
GATTTTTACG 
AGTCAAAAAC 
ATGATGCGGC 
GACGAAGACG 
CGGCGGCACA 
TCTTGATGAC 



This corresponds to the amino acid sequence <SEQ ID 884; ORF133ng-l>: 



1 MRSSFRLKPI CFYLMGVMLY HHSYA EDAGR AGSEAQIQVL EDVHVKAKRV 

51 PKDKKVFTDA RAVSTRQDVF KSGENLDNIV RSIPGAFTQQ DKSSGIVSLN 

101 IRGDSGFGRV NTMVDGITQT FYSTSTDAGR AGGSSQFGAS VDSNFIAGLD 

151 WKGSFSGSA GINSLAGSAN LRTLGVDDW QGNNTYGLLL KGLTGTNSTK 

201 GNAMAAIGAR KWLESGASVG VLYGHSRRGV AQNYRVGGGG QHIGNFGEEY 

251 LERRKQQYFV QEGGLKFNAG SGKWERDLQR QYWKTKWYKK YEDPQELQKY 

301 IEEHDKSWRE NLAPQYDITP IDPSGLKQQS AGNLFKLEYD GVFNKYTAQF 

351 RDLNTRIGSR KIINRNYQFN YGLSLNPYTN LNLTAAYNSG RQKYPKGAKF 

401 TGWGLLKDFE TYNNAKILDL NNTATFRLPR ETELQTTLGF NYFHNEYGKN 

451 RFPEELGLFF DGPDQDNGLY SYLGRFKGDK GLLPQKSTIV QPAGSQYFNT 

501 FYFDAALKKD IYRLNYSTNA INYRFGGEYT GYYGSENEFK RAFGENSPAY 

551 KEHCDPSCGL YEPVLKKYGK KRANNHSVSI SADFGDYFMP FAGYSRTHRM 

601 PNIQEMYFSQ IGDSGVHTAL KPERANTWQF GFNTYKKGLL KQDDILGLKL 

651 VGYRSRIDNY IHNVYGKWWD LNGDIPSWVG STGLAYTIRH RNFKDKVHKH 

701 GFELELNYDY GRFFTNLSYA YQKSTQPTNF SDASESPNNA SKEDQLKQGY 

751 GLSRVSALPR DYGRLEVGTR WLGNKLTLGG AMRYFGKSIR ATAEERYIDG 

801 TNGGNTSNVR QLGKRSIKQT ETLARQPLIF DFYAAYEPKK NLIFRAEVKN 

851 LFDRRYIDPL DAGNDAATQR YYSSFDPKDK DEDVTCNADK TLCNGKYGGT 

901 SKSVLTNFAR GRTFLMTMSY KF* 

ORF133ng-l and ORF133-1 show 96.2% identity in 889 aa overlap: 



10 20 30 40 50 60 

orf 133no-l . pep SFRLKPICFYLMGVMLYHHSYAEDAGRAGSEAQIQVLEDVHVKAKRVPKDKKVFTDARAV 

I I I II I I II I I I I I I I I I I I I I I I I I I I I I 
orfl33-l EAQ I QVLE DVH VKAKRVPKDKKV FT DARAV 

10 20 30 

70 80 90 100 110 120 

orf 133ng-l . pep STRQDVFKSGENLDNIVRSIPGAFTQQDKSSGIVSLNIRGDSGFGRVNTMVDGITQTFYS 
I 1 I I I: I II: I I I I I I I I I I I M II I I M I Ml I I I II I II I I I I I I I I 1 I I M I II M I 
orf 133-1 STRQDIFKSSENLDNIVRSIPGAFTQQDKSSGIVSLNIRGDSGFGRVNTMVDGITQTFYS 

40 50 60 70 80 90 

130 140 150 160 170 180 

orfl33ng-l.pep TSTDAGRAGGSSQFGASVDSNFIAGLDWKGSFSGSAGINSLAGSANLRTLGVDDWQGN 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I M I 1 1 1 I I I 
orf 133-1 TSTDAGRAGGSSQFGASVDSNFIAGLDWKGSFSGSAGINSLAGSANLRTLGVDDWQGN 

100 110 120 130 140 150 

190 200 210 220 230 240 

orfl33ng-l.pep NTYGLLLKGLTGTNSTKGNAMAAIGARKWLESGASVGVLYGHSRRGVAQNYRVGGGGQHI 
I M I M I I I II I I I I I M II I MM M M I I I M I M I M I MM: I I I 1 I I I 1 I II I I 1 
orf 133-1 NTYGLLLKGLTGTNSTKGNAMAAIGARKWLESGASVGVLYGHSRRSVAQNYRVGGGGQHI 

160 170 180 190 200 210 



250 260 270 280 290 300 

orf 133ng-l . pep GNFGEEYLERRKQQYFVQEGGLKFNAGSGKWERDLQRQYWKTKWYKKYEDPQELQKYIEE 



WO 99/24578 



-481- 



PCT/IB98/01665 



MM lllllllhillllhllll: MMMMMI M I MM:: MMMMI 
orf 133-1 GNFGAEYLERRKQRYFVQEGALKFNSDSGKWERDLQRQQWKYKPYKNYNN-QELQKYIEE 

220 230 240 250 260 

5 310 320 330 340 350 360 

orfl33ng-l.pep HDKSWRENLAPQYDITPIDPSGLKQQSAGNLFKLEYDGVFNKYTAQFRDLNTRIGSRKII 
MMMMI M M M M M M M M M M M M M M I M M M M M i M : M M M I 
orf 133-1 HDKSWRENLXPQYDITPIDPSSLKQQSAGNLFKLEYDGVFNKYTAQFRDLNTKIGSRKII 
270 280 290 300 310 320 

10 

370 380 390 400 410 420 

orf 133ng-l . pep NRNYQFNYGLSLNPYTNLNLTAAYNSGRQKYPKGAKFTGWGLLKDFETYNNAKILDLNNT 
M M M M M M M M M M M M M M M M M M M M M M M M I M M M M M I 
orf 133-1 NRNYQFNYGLSLNPYTNLNLTAAYNSGRQKYPKGSKFTGWGLLKDFETYNNAKILDLNNT 
15 330 340 350 360 370 380 

430 440 450 460 470 480 

orf 133ng-l . pep ATFRLPRETELQTTLGFNYFHNEYGKNRFPEELGLFFDGPDQDNGLYSYLGRFKGDKGLL 
M M M M M M M M M M M M M i M M M M M M M M M M M M M M M M I 
20 orf 133-1 ATFRLPRETELQTTLGFNYFHNEYGKNRFPEELGLFFDGPDQDNGLYSYLGRFKGDKGLL 

390 400 410 420 430 440 

490 500 510 520 530 540 

orfl33ng-l.pep PQKSTIVQPAGSQYFNTFYFDAALKKDIYRLNYSTNAINYRFGGEYTGYYGSENEFKRAF 
25 M M M M M M M M M M M M M M M M M M : : : M M M M M I M : : M M M 

orf 133-1 PQKSTIVQPAGSQYFNTFYFDAALKKDIYRLNYSTNTVGYRFGGEYTGYYGSDDEFKRAF 
450 460 470 480 490 500 

550 560 570 580 590 600 

30 orf 133ng-l .pep GENSPAYKEHCDPSCGLYEPVLKKYGKKRANNHSVSISADFGDYFMPFAGYSRTHRMPNI 

M M I : M : I M M I : M M M M M M M M M M M M M M M M : M M M M I I 
orf 133-1 GENSPTYKKHCNRSCGIYEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNI 
510 520 530 540 550 560 

35 610 620 630 640 650 660 

orf 133ng-l . pep QEMYFSQIGDSGVHTALKPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDNYIHN 
M M M M M M M M M M M M M M M M M M M I M M M I M I M M M M M 
orf 133-1 QEMYFSQIGDSGVHTALKPERANTWQFGFNTYKKGLLKQDDTLGLKLVGYRSRIDNYIHN 
570 580 590 600 610 620 

40 

670 680 690 700 710 720 

orf 133ng-l . pep VYGKWWDLNGDIPSWVGSTGLAYTIRHRNFKDICVHKHGFELELNYDYGRFFTNLSYAYQK 
M M M M M M M M : M M M M : I I M M M M M M M M M M M I M M M M I 
orf 133-1 VYGKWWDLNGDIPSWVSSTGLAYTIQHRNFKDKVHKHGFELELNYDYGRFFTNLSYAYQK 
45 630 640 650 660 670 680 

730 740 750 760 770 780 

orf 133ng-l . pep STQPTNFSDASESPNNASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMR 
I M M I M I M M M M M M M M M M M M M I M I M M I M M M M M M M M 
50 orf 133-1 STQPTNFSDASESPNNASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMR 

690 700 710 720 730 740 

790 800 810 820 830 840 

orf 133ng-l.pep YFGKS IRATAEERYI DGTNGGNTSNVRQLGKRS IKQTETLARQPLI FDFYAAYEPKKNLI 
55 M M M M i M M I M M M I M M M M M M M M M M M M I M M M M M M I 

orf 133-1 YFGKS IRATAEERYI DGTNGGNTSNFRQLGKRS IKQTETLARQPLI FDFYAAYEPKKNLI 

750 760 770 780 790 800 

850 860 870 880 890 900 

60 orf 133ng-l .pep FRAEVKNLFDRRYIDPLDAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTSKS 

M M M M M M M M M M M M M M M M M I M M M M M M I M M M M M M 
orf 133-1 FRAEVKNLFDRRYIDPLDAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTSKS 
810 820 830 840 850 860 

65 910 920 

orfl33ng-l.pep VLTNFARGRTFLMTMSYKFX 
t M M M M M M I M M M 
orfl33-l VLTN FARGRT FLMTMS YK FX 

870 880 

70 In addition, ORF133ng-l is homologous to a TonB-dependent receptor in Kinfluenzae: 
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sp|P45114 |YC17_HAEIN PROBABLE TONB- DEPENDENT RECEPTOR HI1217 PRECURSOR 
>gi| 1075372 Ipir | IG64110 transferrin binding protein 1 precursor (tbpl) homolog - 
Haemophilus influenzae (strain Rd KW20) >gi 1 1574147 (U32801) transferrin binding 
protein 1 precursor (tbpl) [Haemophilus influenzae] Length » 913 
Score = 930 bits (2377), Expect - 0.0 

Identities - 476/921 (51%), Positives = 619/921 (66%), Gaps « 72/921 (7%) 
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Query: 38 QVLEDVHVKAKRVPKDKKVFTDARAVSTRQDVFKSGENLDNIVRSIPGAFTQQDKSSGIV 97 

+ L + V K + DKK FT+A+A STR++VFK + +D ++RS I PGAFTQQDK SG+V 
Sbjct: 29 ETLGQIDWEKVISNDKKPFTEAKAKSTRENVFKETQTIDQVIRSIPGAFTQQDKGSGW 88 

Query : 98 SLNIRGDSGFGRVNTMVDGITQTFYSTSTDAGRAGGSSQFGASVDSNFIAGLDWKGSFS 157 

S+NIRG++G GRVNTMVDG+TQTFYST+ D+G++GGSSQFGA++D NFIAG+DV K +FS 
Sbjct: 89 SVNIRGENGLGRVNTMVDGVTQTFYSTALDSGQSGGSSQFGAAIDPNFIAGVDVNKSNFS 148 

Query: 158 GSAG INS LAGS ANLRTLGVDDVVQXXXXXXXXXXXXXXXXXXXXXAMAAIGARKWLESGA 217 

G++G IN+LAGS AN RTLGV+DV+ M RKWL++G 

Sbjct: 14 9 GASGINALAGSANFRTLGVNDVITDDKPFGIILKGMTGSNATKSNFMTMAAGRKWLDNGG 208 

Query: 218 SVGVLYGHSRRGVAQNYRVGGGGQHIGNFGEEYLERRKQQYFVQEGGLKFNAGSGKWERD 277 

VGV+YG+S+R V+Q+YR+ GGG+ + + G++ L + K+ YF + G N G+W D 
Sbjct: 209 YVGVVYGYSQREVSQDYRI-GGGERLASLGQDILAKEKEAYF-RNAGYILNP-EGQWTPD 265 

Query: 278 LQRQYWK TKWY KKYEDPQELQK YIEE 303 

L +++W +Y KK +D ++LQK IEE 

Sbjct: 266 LSKKHWSCNKPDYQKNGDCSYYRIGSAAKTRREILQELLTNGKKPKDIEKLQKGNDGIEE 325 

Query: 304 HDKSWRENLAPQYDITPIDPSGLKQQSAGNLFKLEYDGVFNKYTAQFRDLNTRIGSRKII 363 

DKS+ N QY + PI+P L+ +S +L K EY AQ R L+ +IGSRKI 

Sbjct: 326 TDKSFERN-KDQYSVAPIEPGSLQSRSRSHLLKFEYGDDHQNLGAQLRTLDNKIGSRKIE 384 

Query: 364 NRNYQFNYGLSLNPYTNLNLTAAYNSGRQKYPKGAKFTGWGLLKDFETYNNAKILDLNNT 423 

NRNYQ NY + N Y +LNL AA+N G+ YPKG F GW + T N A I+D+NN+ 

Sbjct: 385 NRNYQVNYNFNNNSYLDLNLMAAHNIGKTIYPKGGFFAGWQVADKLITKNVANIVDINNS 444 

Query: 424 ATFRLPRETELQTTLGFNYFHNEYGKNRFPEELGLFFDGPDQDNGLYSY — LGRFKGDKG 481 

TF LP+E +L+TTLGFNYF NEY KNRFPEEL LF++ D GLYS+ GR+ G K 
Sbjct: 445 HTFLLPKEIDLKTTLGFNYFTNEYSKNRFPEELSLFYNDASHDQGLYSHSKRGRYSGTKS 504 

Query: 482 LLPQKSTIVQPAGSQYFNTFYFDAALKKDIYRLNYSTNAINYRFGGEYTGYYGSENEFKR 541 

LLPQ+S I+QP+G Q F T YFD AL K IY LNYS N +Y F GEY GY 
Sbjct: 505 LLPQRSVILQPSGKQKFKTVYFDTALSKGIYHLNYSVNFTHYAFNGEYVGY 555 

Query: 542 AFGENSPAYKEHCDPSCGLYEPVLKKYGKKRANNHSVSISADFGDYFMPFAGYSRTHRMP 601 

EN+ + + EP+L K G K+A NHS ++SA+ DYFMPF YSRTHRMP 

Sbjct: 556 ENTAGQQ INEPILHKSGHKKAFNHSATLSAELSDYFMPFFTYSRTHRMP 604 

Query: 602 NIQEMYFSQIGDSGVHTALKPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDNYI 661 

NIQEM+FSQ+ ++GV+TALKPE+++T+Q GFNTYKKGL QDD+LG+KLVGYRS I NYI 
Sbjct: 605 NIQEMFFSQVSNAGVNTALKPEQSDTYQLGFNTYKKGLFTQDDVLGVKLVGYRSFIKNYI 664 

Query: 662 HNVYGKWWDLNGDIPSWVGSTGLAYTIRHRNFKDKVHKHGFELELNYDYGRFFTNLSYAY 721 

HNVYG WW +P+W S G YTI H+N+K V K G ELE+NYD GRFF N+SYAY 

Sbjct: 665 HNVYGVWW — RDGMPTWAESNGFKYTIAHQNYKPIVKKSGVELEINYDMGRFFANVSYAY 722 

Query: 722 QKSTQPTNFSDASESPNNASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGA 781 

Q++ QPTN++DAS PNNAS+ED LKQGYGLSRVS LP+DYGRLE+GTRW KLTLG A 
Sbjct: 723 QRTNQPTNYADASPRPNNASQEDILKQGYGLSRVSMLPKDYGRLELGTRWFDQKLTLGLA 782 

Query: 782 MRYFGKS IRATAEERYI DGTNGGNTSNVRQLGKRS IKQTETLARQPLI FDFYAAYEPKKN 841 

RY+GKS RAT EE YI+G+ + +R+ ++K+TE + +QP+I D + +YEP K+ . 

Sbjct: 783 ARYYGKSKRATIEEEYINGSR-FKKNTLRRENYYAVKKTEDIKKQPIILDLHVSYEPIKD 841 

Query: 842 LIFRAEVKNLFDRRYIDPLDAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTS 901 

LI +AEV+NL D+RY+DPLDAGNDAA+QRYYSS + + C D + C GG+ 
Sbjct: 842 LIIKAEVQNLLDKRYVDPLDAGNDAASQRYYSSL NNSIECAQDSSAC GGSD 892 

Query: 902 KSVLTNFARGRTFLMTMSYKF 922 

K+VL NFARGRT++++++YKF 
Sbjct: 893 KTVLYNFARGRTYILSLNYKF 913 
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The underlined motif in the gonococcal protein (also present in the meningococcal protein) is 
predicted to be an ATP/GTP-binding site motif A (P-loop), and the analysis suggests that these 
proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 

Example 104 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 885> 

1 ATGAACCTGA TTTCACGTTA CATCATCCGT CAAATGGCGG TTATGGCGGT 

51 TTACGCGCTC CTTGCCTTCC TCGCTTTGTA CAGCTTTTTT GAAATCCTGT 

101 ACGAAACCGG CAACCTCGGC AAAGGCAGTT ACGGCATATG GGAAATGCTG 

151 GGCTACACCG CCCTCAAAAT GCCCGCCCGC GCCTACGAAC TGATTCCCCT 

201 CGCCGTCCTT ATCGGCGGAC TGGTCTCCCT CAGCCAGCTT GCCGCCGGCA 

251 GCGAACTGAC CGTCATCAAA GCCAGCGGCA TGAGCACCAA AAAGCTGCTG 

301 TTGATTCTGT CGCAGTTCGG TTTTATTTTT GCTATTGCCA CCGTCGCGCT 

351 CGGCGAATGG GTTGCGCCCA CACTGAGCCA AAAAGCCGAA AACATCAAAG 

401 CCGCCGCCAT CAACGGCAAA ATCAGCACCG GCAATACCGG CCTTTGGCTG 

451 AAAGAAAAAA ACAGCGTGAT CAATGTGCGC GAAATGTTGC CCGACCAT. . 

This corresponds to the amino acid sequence <SEQ ID 886; ORF1 12>: 

1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEML 

51 GYTALKMPAR AYE LIPLAVL IGGLVSLSQL AAGSELTVIK ASGMSTKKLL 

101 LILSQFGFIF AIATVA LGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKNSVINVR EMLPDH . . . 

Further work revealed further partal nucleotide sequence <SEQ ID 887>: 

1 ATGAACCTGA TTTCACGTTA CATCATCCGT CAAATGGCGG TTATGGCGGT 

51 TTACGCGCTC CTTGCCTTCC TCGCTTTGTA CAGCTTTTTT GAAATCCTGT 

101 ACGAAACCGG CAACCTCGGC AAAGGCAGTT ACGGCATATG GGAAATGCTG 

151 gGCTACACCG CCCTCAAAAT GCCCGCCCGC GCCTACGAAC TGATTCCCCT 

201 CGCCGTCCTT ATCGGCGGAC TGGTCTCCCT CAGCCAGCTT GCCGCCGGCA 

251 GCGAACTGAC CGTCATCAAA GCCAGCGGCA TGAGCACCAA AAAGCTGCTG 

301 TTGATTCTGT CGCAGTTCGG TTTTATTTTT GCTATTGCCA CCGTCGCGCT 

351 CGGCGAATGG GTTGCGCCCA CACTGAGCCA AAAAGCCGAA AACATCAAAG 

4 01 CCGCCGCCAT CAACGGCAAA ATCAGCACCG GCAATACCGG CCTTTGGCTG 

451 AAAGAAAAAA ACAGCrTkAT CAATGTGCGC GAAATGTTGC CCGACCATAC 

501 GCTTTTGGGC ATCAAAATTT GGGCGCGCAA CGATAAAAAC GAATTGGCAG 

551 AGGCAGTGGA AGCCGATTCC GCCGTTTTGA ACAGCGACGG CAGTTGGCAG 

601 TTGAAAAACA TCCGCCGCAG CACGCTTGGC GAAGACAAAG TCGAGGTCTC 

651 TATTGCGGCT GAAGAAAACT GGCCGATTTC CGTCAAACGC AACCTGATGG 

701 ACGTATTGCT CGTCAAACCC GACCAAATGT CCGTCGGCGA ACTGACCACC 

751 TACATCCGCC ACCTCCAAAA CAACAGCCAA AACACCCGAA TCTACGCCAT 

801 CGCATGGTGG CGCAAATTGG TTTACCCCGC CGCAGCCTGG GTGATGGCGC 

851 TCGTCGCCTT TGCCTTTACC CCGCAAACCA CCCGCCACGG CAATATGGGC 

901 TTAAAACTCT TCGGCGGCAT CTGTsTCGGA TTGCTGTTCC ACCTTGCCGG 

951 ACGGCTCTTT GGGTTTACCA GCCAACTCGG. . . 

This corresponds to the amino acid sequence <SEQ ID 888; ORF112-l>: 



1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEML 

51 GYTALKMPAR A YE LIPLAVL IGGLVSLSQL AAGSELTVIK ASGMSTKKLL 

101 LILSQFGFIF AIATVA LGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKNSXINVR EMLPDHTLLG IKIWARNDKN ELAEAVEADS AVLNSDGSWQ 

201 LKNIRRSTLG EDKVEVSIAA EENWPISVKR NLMDVLLVKP DQMSVGELTT 

251 YIRHLQNNSQ NTRIYAIAWW RK LVYPAAAW VMALVAFAF T PQTTRHGNMG 

301 LKLFGGICXG LLFHLA GRLF GFTSQL. . . 

Computer analysis of this amino acid sequence predicts two transmembrane domains and gave the 
following results: 
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Homology with a predicted ORF from Kmeninzitidis (strain A) 

ORF1 12 shows 96.4% identity over a 166aa overlap with an ORF (ORF1 12a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 112 . pep MNLI SRYI IRQMAVMAVYALLAFLALYS FFE ILYETGNLGKGS YGIWEMLGYTALKMPAR 
I I II I i M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I i I I I I lllilll II 
orf 112a MNLI SRYI IRQMAVMAVYALLAFLALYS FFE I LYETGNLGKGSYGIWEMXGYTALKMXAR 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 112. pep AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 

I I i I : I I I I I I I I I I I I I I I I I 1 I I : I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I 
orf 112a AYELMPLAVLIGGLVSXSQLAAGSELXVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 

70 80 90 100 110 120 

130 140 150 160 

orf 112 . pep VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSVINVREMLPDH 
I I I I I I I 1 I I I I I i I I M I I I I I I I I I I I I I I II i : I I I I 1 I II M 
orf 112a VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSIINVREMLPDHTLLGIKIWARNDKN 

130 140 150 160 170 180 



orf 112a ELAEAVEADSAVLNSDGSWQLKNIRRSTLGEDKVEVSIAAEEXWPISVKRNLMDVLLVKP 

190 200 210 220 230 240 

The ORF1 12a nucleotide sequence <SEQ ID 889> is: 



1 


ATGAACCTGA 


51 


TTACGCGCTC 


101 


ACGAAACCGG 


151 


GGNTACACCG 


201 


CGCCGTCCTT 


251 


GCGAACTGAN 


301 


TTGATTCTGT 


351 


CGGCGAATGG 


401 


CCGCGGCCAT 


451 


AAAGAAAAAA 


501 


CCTGCTGGGC 


551 


AGGCAGTGGA 


601 


TTGAAAAACA 


651 


TATTGCGGCT 


701 


ACGTATTGCT 


751 


TACATCCGCC 


801 


CGCATGGTGG 


851 


TCGTCGCCTT 


901 


TTAAAANTCT 


951 


NCGGCTCTTC 


1001 


NCGGCGCACT 


1051 


CGCAAACAGG 



TTTCACGTTA 
CTTGCCTTCC 
CAACCTCGGC 
CCCTCAAAAT 
ATCGGCGGAC 
CGTCATCAAA 
CGCAGTTCGG 
GTTGCGCCCA 
CAACGGCAAA 
ACAGCATTAT 
ATTAAAATCT 
AGCCGATTCC 
TCCGCCGCAG 
GAAGAAAANT 
CGTCAAACCC 
ACCTCCAAAN 
CGCAAATTGG 
TGCCTTTACC 
TCGGCGGCAT 
NGGTTTACCA 
ACCTACCATA 
AAAAACGCTA 



CATCATCCGT 
TCGCTTTGTA 
AAAGGCAGTT 
GNCCGCCCGC 
TGGTCTCTNT 
GCCAGCGGCA 
TTTTATTTTT 
CACTGAGCCA 
ATCAGTACCG 
CAATGTGCGC 
GGGCCCGCAA 
GCCGTTTTGA 
CACGCTTGGC 
GGCCGATTTC 
GACCAAATGT 
NNACAGCCAA 
TTTACCCCGC 
CCGCAAACCA 
CTGTCTCGGA 
GCCAACTCTA 
GCCTTCGCCT 
A 



CAAATGGCGG 
CAGCTTTTTT 
ACGGCATATG 
GCCTACGAAC 
CAGCCAGCTT 
TGAGCACCAA 
GCTATTGCCA 
AAAAGCCGAA 
GCAATACCGG 
GAAATGTTGC 
CGATAAAAAC 
ACAGCGACGG 
GAAGACAAAG 
CGTCAAACGC 
CCGTCGGCGA 
AACACCCGAA 
CGCAGCCTGG 
CCCGCCACGG 
TTGCTGTTCC 
CGGCATCCCG 
TGCTCGCCGT 



TTATGGCGGT 
GAAATCCTGT 
GGAAATGNTG 
TGATGCCCCT 
GCCGCCGGCA 
AAAGCTGCTG 
CCGTCGCGCT 
AACATCAAAG 
CCTTTGGCTG 
CCGACCATAC 
GAACTGGCAG 
CAGTTGGCAG 
TCGAGGTCTC 
AACCTGATGG 
ACTGACCACC 
TCTACGCCAT 
GTGATGGCGC 
CAATATGGGC 
ACCTTGCCGG 
CCCTTCCTCG 
TTGGCTGATA 



This encodes a protein having the amino acid sequence <SEQ ID 890>: 



1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEMX 

51 GYTALKMXAR A YE LMPLAVL IGGLVSXSQ L AAGSELXVIK ASGMSTKKLL 

101 LILSQFGFIF AIATVA LGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKNSIINVR EMLPDHTLLG IKIWARNDKN ELAEAVEADS AVLNSDGSWQ 

201 LKNIRRSTLG EDKVEVSIAA EEXWPISVKR NLMDVLLVKP DQMSVGELTT 

251 YIRHLQXXSQ NTRIYAIAWW RK LVYPAAAW VMALVAFAF T PQTTRHGNMG 

301 LKXFGGICLG LLFHL AGRLF XFTSQLYGIP PFLXGALPTI AFALLAVWLI 

351 RKQEKR* 

ORF1 12a and ORF1 12-1 show 96.3% identity in 326 aa overlap: 



orf 112a . pep MNLISRYI IRQMAVMAVYALLAFLALYS FFE I LYETGNLGKGSYGIWEMXGYTALKMXAR 
Ml || I || II III I I II I II I I II I I II II III Ml M Ml I Ml I I M M I MM II 
orf 112-1 MNLI SRYI IRQMAVMAVYALLAFLALYS FFE ILYETGNLGKGS YGIWEMLGYTALKMPAR 



orf 112a . pep AYELMPLAVLIGGLVSXSQLAAGSELXVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 
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I 1 I I : I I I II I I I I I I I I I I I I I I I : I I I I I I I I I I I 1 I I I I i I I I I I I I I I II I I I I I 
orf 112-1 AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 

orf 112a. pep VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSIINVREMLPDHTLLGIKIWARNDKN 

I I I II I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II II I I 
orf 112-1 VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSXINVREMLPDHTLLGIKIWARNDKN 

orf 112a . pep ELAEAVEADSAVLNSDGSWQLKNIRRSTLGEDKVEVSIAAEEXWPISVKRNLMDVLLVKP 
M I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 1 
orf 112-1 ELAEAVEADSAVLNSDGSWQLKNIRRSTLGEDKVEVSIAAEENWPISVKRNLMDVLLVKP 

orf 112a . pep DQMSVGELTTYIRHLQXXSQNTRIYAIAWWRKLVYPAAAWVMALVAFAFTPQTTRHGNMG 

II M I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 
orf 112-1 DQMSVGELTTYIRHLQNNSQNTRIYAIAWWRKLVYPAAAWVMALVAFAFTPQTTRHGNMG 

orf 112a . pep LKXFGGICLGLLFHLAGRLFXFTSQLYGIPPFLXGALPTIAFALLAVWLIRKQEKRX 

II I I I I I I I I I I I I I I I I I I I I I 
orf 112-1 LKL FGG I CXGLL FH LAGRL FG FT SQL 

Homology with a predicted ORF from N.gonorrhoeae 

ORF1 12 shows 95.8% identity over 166aa overlap with a predicted ORF (ORF1 12ng) from N. 



gonorrhoeae: 



orf 112 .pep 
orfll2ng 
orf 112. pep 
orf!12ng 
orf 112. pep 
orf 112ng 



MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 60 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 
MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 60 

AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 120 
I I M: I I I I I I I I I: I I I I I 1 I I I II: I II ! I I I I I M I i I I I I I II M I I I I: I I I I I I 
AYELMPLAVLIGGLASLSQLAAGSELAVIKASGMSTKKLLLILSQFGFIFAIAAVALGEW 120 

VAPTLSQK7\ENIKAAAINGKISTGNTGLWLKEKNSVINVREMLPDH 166 
I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I 1 I : I : I I I I I I I I I 

VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKTSIINVRGMLPDHTLLGIKIWARNDKN 180 



The complete length ORF1 12ng nucleotide sequence <SEQ ID 891> is: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 



ATGAACCTGA 
TTACGCGCTC 
ACGAAACCGG 
GGCTACACCG 
CGCCGTCCTC 
GCGAACTGGC 
TTGATTCTGT 
CGGCGAATGG 
cCGCCGCCAt 
AAAGAAAAAa 
GCTTTTGGGC 
AGGCAGTGGA 
TTGAAAAACA 
cgCCGCCGCC 
ACGTATTGCT 
TACATCCGCC 
CGCATGGTGG 
TCGTTGCCTT 
TTAAAACTCT 
CAGGCTCTTC 
CCGGCGCACT 
CGCAAACAGG 



TTTCACGTTA 
CTTGCCTTCC 
CAACCTCGGC 
CCCTCAAAAT 
ATCGGCGGAC 
CGTCATCAAA 
CTCAGTTCGG 
GTTGCGCCCA 
taacggCAAA 
ccAGCATTAT 
ATCAAAATTT 
AGCCGATTCC 
TCCGCCGCAG 
GAAGAAACTT 
CGTCAAGCCC 
ACCTCCAAAA 
CGTAAACTCG 
CGCCTTTACG 
TCGGCGGCAT 
GGGTTTACCA 
GCCTACCATA 
AAAAACGTTG 



CATCATCCGC 
TCGCTTTGTA 
AAAGGCAGTT 
GCCCGCCCGC 
TGGCCTCTCT 
GCCAGCGGCA 
TTTTATTTTT 
CGCTGAGCCA 
ATCAGCAccg 
CAATGTGcGc 
GGGCGCGCAA 
GCCGTTTTGA 
CATCATGGGT 
gGCCGATTGC 
GACCAAATGT 
CAACAGCCAA 
TTTACCCCGT 
CCGCAAACCA 
CTGTCTCGGA 
GCCAACTCTA 
GCCTTCGCCT 
A 



CAAATGGCGG 
CAGCTTTTTT 
ACGGCATATG 
GCCTACGAAC 
CAGCCAGCTT 
TGAGCACCAA 
GCTATTGCCG 
AAAAGCCGAA 
gcAATACCGG 
GGAATGTTGC 
CGATAAAAAC 
ACAGCGACGG 
ACAGACAAAA 
CGTCAGACGC 
CCGTCGGCGA 
AACACCCAAA 
CGCCGCATGG 
CGCGCCACGG 
TTGCTGTTCC 
CGGCACCCCA 
TGCTCGCTGT 



TTATGGCGGT 
GAAATCCTGT 
GGAAATGCTG 
TCATGCCCCT 
GCCGCCGGCA 
AAAGCTGCTG 
CCGTCGCGCT 
AACATCAAag 
CCTTTggcTG 
CCGACCATAC 
GAATTGGCAG 
CAGCTGGCAG 
TCGAAACATC 
AACCTGATGG 
GCTGACCACC 
TCTACGCCAT 
GTCATGGCGC 
CAATATGGGC 
ACCTTGCCGG 
CCCTTCCTCG 
TTGGCTGATA 



This encodes a protein having amino acid sequence <SEQ ID 892>: 



l 

51 
101 
151 
201 
251 



MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEML 



GYTALKMPAR A YE LMPLAVL IGGLASLSQL AAGSELAVIK 
LILSQFGFIF AIAAVA LGEW VAPTLSQKAE NIKAAAINGK 
KEKTSIINVR GMLPDHTLLG IKIWARNDKN ELAEAVEADS 
LKNIRRSIMG TDKIETSAAA EETWPIAVRR NLMDVLLVKP 
YIRHLQNNSQ NTQIYAIAWW R KLVYPVAAW VMALVAFAF T 



301 LKLFGGICLG LLFHLAGRLF GFTSQLYGTP PF LAGALPTI 



ASGMSTKKLL 
ISTGNTGLWL 
AVLNSDGSWQ 
DQMSVGELTT 
PQTTRHGNMG 
AFALLAVWLI 
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351 RKQEKR* 

ORF1 12ng and ORF1 12-1 show 94.2% identity in 326 aa overlap: 

10 20 30 40 50 60 

orfll2ng MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 

I I I I I I I I I I I I I I I I I I I I I I I 1 I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 112-1 MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orfll2ng AYELMPLAVLIGGIASLSQLAAGSELAVIKASGMSTKKLLLILSQFGFIFAIAAVALGEW 
I ! I I : I I I I I I I I I : I I I I I I I I I I I : I I I I II I I I I I I I I I I M I I I I I I I I : I I I I I I 
orf 112-1 AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 

70 80 90 100 110 120 

130 140 150 160 170 180 

orfll2ng VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKTSIINVRGMLPDHTLLGIKIWARNDKN 

I I I I I I I I I I 1 I I i I I I I I I I I I I I I I I II I I I : I I I I I I I I I I I I I I II I I I I I I I I 
orf 112-1 VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSXINVREMLPDHTLLGIKIWARNDKN 

130 140 150 160 170 180 

190 200 210 220 230 240 

orfll2ng ELAEAVEADSAVLNSDGSWQLKNIRRSIMGTDKIETSAAAEETWPIAVRRNLMDVLLVKP 
I II I E I I I I I I I I I I I I I I I I I I I I 1 I :l 11:1:1 I I I I : I I I : I : I I I I I I I I I I I 
orf 112-1 ELAEAVEADSAVLNSDGSWQLKNIRRSTLGEDKVEVSIAAEENWPISVKRNLMDVLLVKP 

190 200 210 220 230 240 

250 260 270 280 290 300 

orfll2ng DQMSVGELTTYIRHLQNNSQNTQIYAIAWWRKLVYPVAAWVMALVAFAFTPQTTRHGNMG 

I I I I I I II II I I I I I I I I I I I I : I I II I ! I I I I I II : I I I I I I I I I I I I I I I I I I I M I I 
orf 112-1 DQMSVGELTTYIRHLQNNSQNTRIYAIAWWRKLVYPAAAWVMALVAFAFTPQTTRHGNMG 

250 260 270 280 290 300 

310 320 330 340 350 

orfll2ng LKLFGGICLGLLFHLAGRLFGFTSQLYGTPPFLAGALPTIAFALLAVWLIRKQEKRX 

I I I I I I I I I I I I I I I I I I I I I I I II 
orf 112-1 LKLFGGICXGLLFHLAGRLFGFTSQL 

310 320 



This analysis suggests that these proteins from N. meningitidis and N. gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



It will be appreciated that the invention has been described by means of example only, and that 
modifications may be made whilst remaining within the spirit and scope of the invention. 
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TABLE I - PCR primers 



ORF 


Primer 


Sequence 


Restriction sites 


ORF1 


Forward 
Reverse 


CGCGGATCCGCTAGC-GGACACACTTATTTCGG 
CCCGCTCGAG-CCAGCGGTAGCCTAATT 


BamHI-Nhel 
Xhol 


ORF 2 


Forward 
Reverse 


GCGGATCCCATATG-TTTGATTTCGGTTTGGG 
CCCGCTCGAG-GACGGCATAACGGCG 


BamHI-Ndel 
Xhol 


ORF 2-1 


Forward 
Reverse 


GCGGATCCCATATG-TTTGATTTCGGTTTGGG 
CCCGCTCGAG-TGATTTACGGACGCGCA 


BamHI-Ndel 
Xhol 


ORF 4 


Forward 
Reverse 


GCGGATCCCATATG-TGCGGAGGTCAAAAAGAC 
CCCGCTCGAG-TTTGGCTGCGCCTTC 


BamHI-Ndel 
Xhol 


ORF 5 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-TGGAAGGCGCACAACC 

CGGGATCC-ATGGAAGGCGCACAAC 

CCCGCTCGAG-GACTGTGCAAAAACGG 


Ndel-Ncol 

BamHI 

Xhol 


ORF 6 


Forward 
Reverse 


CGCGGATCCCATATG-ACCCGTCAATCTCTGCA 
CCCGCTCGAG-TGCGCCGAACACTTTC 


BamHI-Ndel 
Xhol 


ORF 7 


Forward 
Reverse 


CGCGGATCCGCTAGC-GCGCTGCTTTTTGTTCC 
CCCGCTCGAG-TTTCAAAATATATTTGCGGA 


BamHI-Nhel 
Xhol [ 


ORF 8 


Forward 
Reverse 


GCGGATCCCATATG-GCTCAACTGCTTCGTAC 
CCCGCTCGAG-AGCAGGCTTTGGCGC 


BamHI-Ndel 
Xhol 


ORF 9 


Forward 
Reverse 


CGCGGATCCCATATG-CCGAAGGAAGTCGGAAA 
CCCGCTCGAG-TTTCCGAGGTTTTCGGG 


BamHI-Ndel 
Xhol 


ORF 10 


Forward 
Reverse 


GCGGATCCCATATG-GACACAAAAGAAATCCTC 
CCCGCTCGAG- TAATGGGAAACCTTGTTTT 


BamHI-Ndel 
Xhol 


ORF 11 


Forward 
Reverse 


GCGGATCCCATATG-GCGGTCAACCTCTACG 
CCCGCTCGAG-GGAAACGACTTCGCC 


BamHI-Ndel 
Xhol 


ORF 13 


Forward 
Reverse 


CGCGGATCCCATATG-GCTCTGCTTTCCGCGC 
CCCGCTCGAG-AGGGTGTGTGATAATAAG 


BamHI-Ndel 
Xhol 


ORF 15 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-GCGGGACACTGACAG 

CGGGATCC-TGCGGGACACTGACAGG 

CCCGCTCGAG-AGGTTGGCCTTGTCTATG 


Ndel-Ncol 

BamHI 

Xhol 


ORF 17 


Forward 


GGAATTCCATATGGCCATGG -TTGCCGGCCTGTTCG 


Ndel-Ncol 
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Forward 
Reverse 


CGGGATCC-ATTGCCGGCCTGTTCG 
CCCGCTCGAG-AAGCAGGTTGTACAGC 


BamHI 
Xhol 


ORF18 


Forward 
Reverse 


GCGGATCCCATATG-ATTTTGCTGCATTTGGAT 
CCCGCTCGAG-TCTTCCAATTTCTGAAAGC 


BamHI-Ndel 
Xhol 


ORF19 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG -TCGCCAGTGTTTTTACC 

CGGGATCC-TTCGCCAGTGTTTTTACCG 

CCCGCTCGAG-GGTGTTTTTGAAGCTGCC 


Ndel-Ncol 

BamHI 

Xhol 


ORF 20 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG -TCGGCGCGGGTATG 

CGGGATCC-TTCGGCGCGGGTATG 

CCCGCTCGAG-CGGCGAGCGAGAGCA 


Ndel-Ncol 

BamHI 

Xhol 


ORF22 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-TGATTAAAATCAAAAAAGGTCT 

CGGGATCC-ATGATTAAAATCAAAAAAGGTCTAAACC 

CCCGCTCGAG-ATTATGATAGCGGCCC 


Ndel-Ncol 

BamHI 

Xhol 


ORF 23 


Forward 
Reverse 


CGCGGATCCCATATG-GATGTTTCTGTTTCAGAC 
CCCGCTCGAG-TTTAAACCGATAGGTAAACG 


BamHI-Ndel 
Xhol 


ORF 24 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG -TGATGCCGGAAATGGTG 

CGGGATCC-ATGATGCCGGAAATGGTG 

CCCGCTCGAG-TGTCAGCGTGGCGCA 


Ndel-Ncol 
BamHI ! 
Xhol 


ORF 25 


Forward 
Reverse 


GCGGATCCCATATG-TATCGCAAACTGATTGC 
CCCGCTCGAG-ATCGATGGAATAGCCG 


BamHI-Ndel 
Xhol 


ORF 26 


Forward 
Reverse 


GCGGATCCCATATG -CAGCTGATCGACTATTC 
CCCGCTCGAG-GACATCGGCGCGTTTT 


BamHI-Ndel 
Xhol 


ORF 27 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-AGACCTATTCTGTTTA 
CGGGATCC- CAGACCTATTCTGTTTATTTTAATC 
CCCGCTCGAG-GGGTTCGATTAAATAACCAT 


Ndel-Ncol 

BamHI 

Xhol 


ORF 28 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-ACGGCTGTACGTTGATGT 
CGGGATCC-AACGGCTGTACGTTGATG 
CCCGCTCGAG- TTTGTCAGAGGAATTCGCG 


Ndel-Ncol 

BamHI 

Xhol 


ORF 29 


Forward 
Forward 
Reverse 


GCGGATCCCATATG -AACGGTTTGGATGCCCG 
CGCGGATCCGCTAGC-AACGGTTTGGATGCCCG 
CCCGCTCGAG-TTTGTCTAAGTTCCTGATATG 


BamHI-Ndel 
BamHI-Nhel 
Xhol 


ORF 32 


Forward 
Reverse 


CGCGGATCCCATATG-AATACTCCTCCTTTTG 
CCCGCTCGAG-GCGTATTTTTTGATGCTTTG 


BamHI-Ndel 
Xhol 


ORF 33 


Forward 
Reverse 


GCGGATCCCATATG -ATTGATAGGGATCGTATG 
CCCGCTCGAG-TTGATCTTTCAAACGGCC 


BamHI-Ndel 
Xhol 
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ORF35 


Forward 
Forward 
Reverse 


GCGGATCCCATATG-TTCAGAGCTCAGCTT 

CGCGGATCCGCTAGC-TTCAGAGCTCAGCTT 

CCCGCTCGAG-AAACAGCCATTTGAGCGA 


BamHI-Ndel 
BamHI-Nhel 
Xhol 


ORF 37 


Forward 
Reverse 


GCGGATCCCATATG-GATGACGTATCGGATTTT 
CCCGCTCGAG-ATAGCCCGCTTTCAGG 


BamHI-Ndel 
Xhol 


ORF 58 


Forward 
Reverse 


CGCGGATCCGCTAGC-TCCGAACGCGAGTGGAT 
CCCGCTCGAG-AGCATTGTCCAAGGGGAC 


BamHI-Nhel 
Xhol 


ORF 65 


Forward 

Forward 
Reverse 


GGAATTCCATATGGCCATGG -TGCTGTATCTGAATCAAG 

CGGGATCC-TTGCTGTATCTGAATCAAGG 
CCCGCTCGAG-CCGCATCGGCAGACA 


Ndel-Ncol 

BamHI 

Xhol 


ORF 66 


Forward 
Reverse 


GCGGATCCCATATG-TACGCATTTACCGCCG 
CCCGCTCGAG-TGGATTTTGCAGAGATGG 


BamHI-Ndel 
Xhol 


ORF 72 


Forward 
Reverse 


CGCGGATCCCATATG- AATGCAGTAAAAATATCTGA 
CCCGCTCGAG-GCCTGAGACCTTTGCAA 


BamHI-Ndel 
Xhol 


ORF 73 


Forward 
Reverse 


GCGGATCCCATATG-AGATTTTTCGGTATCGG 
CCCGCTCGAG-TTCATCTTTTTCATGTTCG 


BamHI-Ndel 
Xhol 


ORF 75 


Forward 
Reverse 


GCGGATCCCATATG- TCTGTCTTTCAAACGGC 
CCCGCTCGAG-TTTGTTTTTGCAAGACAG 


BamHI-Ndel 
Xhol 


ORF 76 


Forward 
Reverse 


GATCAGC T AGCCAT ATG - AAAC AG AAAAAAACCGC 
CGGGATCC-TTACGGTTTGACACCGTT 


Nhel-Ndel 
BamHI 


ORF 79 


Forward 
Reverse 


CGCGGATCCCATATG-GTTTCCGCCGCCG 
CCCGCTCGAG-GTGCTGATGCGCTTCG 


BamHI-Ndel 
Xhol 


ORF 83 


Forward 
Reverse 


GCGGATCCCATATG-AAAACCCTGCTGCTGC 
CCCGCTCGAG-GCCGCCTTTGCGGC 


BamHI-Ndel 
Xhol 


ORF 84 


Forward 
Reverse 


GCGGATCCCATATG- GCAGAGATCTGTTTG 
CCCGCTCGAG-GTTTGCCGATCCGACCA 

CGCGGATCCCATATG- GCGGTTTGGGGCGGA 
CCCGCTCGAG-TCGGCGCGGCGGGC 


BamHI-Ndel 
Xhol 


ORF 85 


Forward 
Reverse 


BamHI-Ndel 
Xhol 


ORF 89 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-CCATACCTTCTTATCA 

CGGGATCC-GCCATACCTTCTTATCAGAG 

CCCGCTCGAG-TTTTTTGCGATTAGAAAAAGC 


Ndel-Ncol 

BamHI 

Xhol 


ORF 97 


Forward 


GCGGATCCCATATG -CATCCTGCCAGCGAAC 


BamHI-Ndel 
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Reverse 


CCCGCTCGAG-TTCGCCTACGGTTTTTTG 


Xhol 


ORF98 


Forward 
Reverse 


GCGGATCCCATATG-ACGGTAACTGCGG 
CCCGCTCGAG-TTGTTGTTCGGGCAAATC 


BamHI-Ndel 
Xhol 


ORF100 


Forward 
Reverse 


GCGGATCCCATATG-TCGGGCATTTACACCG 
CCCGCTCGAG-ACGGGTTTCGGCGGAA 


BamHI-Ndel 
Xhol 


ORF101 


Forward 
Reverse 


GCGGATCCCATATG-ATTTATCAAAGAAACCTC 
CCCGCTCGAG-TTTTCCGCCTTTCAATGT 


BamHI-Ndel 
Xhol 


ORF102 


Forward 
Reverse 


GCGGATCCCATATG-GCAGGGCTGTTTTACC 
CCCGCTCGAG-AAACGGTTTGAACACGAC 


BamHI-Ndel 
Xhol 


ORF103 


Forward 
Reverse 


GCGGATCCCATATG-AACCACGACATCAC 
CCCGCTCGAG-CAGCCACAGGACGGC 


BamHI-Ndel 
Xhol 


ORF104 


Forward 
Reverse 


GCGGATCCCATATG-ACGTGGGGAACGC 
CCCGCTCGAG-GCGGCGTTTGAACGGC 


BamHI-Ndel 
Xhol 


ORF105 


Forward 
Reverse 


GCGGATCCCATATG-ACCAAATTTCAAACCCCTC 
CCCGCTCGAG-TAAACGAATGCCGTCCAG 


BamHI-Ndel 
Xhol 


ORF106 


Forward 
Reverse 


GCGGATCCCATATG-AGGATAACCGACGGCG 
CCCGCTCGAG-TTTGTTCCCGATGATGTT 


BamHI-Ndel 
Xhol 


ORF109 


Forward 
Reverse 


G C GG AT C CC AT AT G - G AAG AT T TAT AT AT AAT AC T CG 
CCCGCTCGAG-ATCAGCTTCGAACCGAAG 


BamHI-Ndel 
Xhol 


ORF110 


Forward 
Reverse 


AAAGAATTC-ATGAGTAAATCCCGTAGATCTCCC 
AAACTGCAG-GGAAAACCACATCCGCACTCTGCC 


EcoRI 
PstI 


ORF111 


Forward 
Reverse 


AAAGAATTC-GCACCGCAAAAGGCAAAAACCGCA 
AAACTGCAG-TCTGCGCGTTTTCGGGCAGGGTGG 


EcoRI 
PstI 


ORF113 


Forward 
Reverse 


AAAGAATTC-ATGAACAAAACCCTCTATCGTGTGATTTTCAACCG 
AAACTGCAG-TTACGAATGCCTGCTTGCTCGACCGTACTG 


EcoRI 
PstI 


ORF115 


Forward 
Reverse 


AAAGAATTC-TTGCTTGTGCAAACAGAAAAAGACGG 
AAAAAAGTCGAC-CTATTTTTTAGGGGC ITTTGC TTGTTTGAAAAGCCTGCC 


EcoRI 
Sail 


ORF119 


Forward 

Reverse 


AAAGAAT TC - TACAACATGTATCAGGAAAACCAATACCG 
AAACTGCAG-TTATGAAAACAGGCGCAGGGCGGTTTTGCC 


EcoRI 
PstI 


ORF1 20 


Fnrvx/arH 

rurwoiu 

Reverse 


AAAGAATTC-GCAAGGCTACCCCAATCCGCCGTG 
AAACTGCAG-CGGTTTGGCTGCCTGGCCGTTGAT 


EcoRI 
PstI 


ORF121 


Forward 
Reverse 


AAAGAATTC-GCCTTGGTCTGGCTGGTTTTCGC 
AAACTGCAG-TCATCCGCCACCCCACCTCGGCCATCCATC 


EcoRI 
PstI 
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ORF122 


Forward 

Reverse 


AAAAAAGTCGAC-ATGTCTTACCGCGCAAGCAGTTC TCC 
AAACT GCAG - T CAG G AACACAAACGATGACG AAT ATCC G TAT C 


Sail 
PstI 


ORF125 


Forward 
Reverse 


AAAGAATTC-GCGCTGTTTTTTGCGGCGGCGTAT 
AAACTGCAG-CGCCGTTTCAAGACGAAAAAGTCG 


EcoRI 
PstI 


ORF126 


Forward 
Reverse 


AAAGAATTC-GCGGAAACGGTCGAAG 
AAACTGCAG-TTAATCTTGTCTTCCGATATAC 


EcoRI 
PstI 


ORF127 


Forward 
Reverse 


AAAGAATTC-ATGACTGATAATCGGGGGTTTACG 
AAAAAAGTCGAC-CTTAAGTAACTTGCAGTCCTTATC 


EcoRI 
Sail 


ORF128 


Forward 
Reverse 


AAAGAATTC-ATGCAAGCTGTCCG CTACAGGCC 

AAACTGCAG- CT A 2TGCAATGCGCCGCC GCGGGAATG ITT GAGCAGGCG 


EcoRI 
PstI 


ORF129 


Forward 
Reverse 


AAAGAATTC-ATGGATTTTCGTTTTGACATTATTTACGAATACCG 
AAACTGCAG-TTATTTTTTGATGAAATTTTGGGGCGG 


EcoRI 
PstI 


ORF130 


Forward 
Reverse 


AAAGAATTC-GCAGTACTTGCCAT TCTCGGTGCG 
AAACTGCAG-CTCCGGATCGTCTGTAAACGCATT 


EcoRI 
PstI 


ORF131 


Forward 
Reverse 


GCGGATCCCATATG-GAAATTCGGGCAATAAAAT 
CCCGCTCGAG-CCAGCGGACGCGTTC 


BamHI-Ndel 
Xhol 


ORF132 


Forward 
Reverse 


GCGGATCCCATATG-AAAGAAGCGGGGTTTG 
CCCGCTCGAG-CCAATCTGCCAGCCGT 


BamHI-Ndel 
Xhol 


ORF133 


Forward 
Reverse 


CGCGGATCCCATATG-GAAGATGCAGGGCGCG 
CCCGCTCGAG-AAACTTGTAGCTCATCGT 


BamHI-Ndel 
Xhol 


ORF134 


Forward 
Reverse 


GCGGATCCCATATG-TCTGTGCAAGCAGTATTG 
CCCGCTCGAG-ATCCTGTGCCAATGCG 


BamHI-Ndel 
Xhol 


ORF 135 


Forward 
Reverse 


GCGGATCCCATATG-CCGTCTGAAAAAGCTTT 
CCCGCTCGAG-AAATACCGCTGAGGATG 


BamHI-Ndel 
Xhol 


ORF136 


Forward 
Reverse 


CGCGGATCCGCTAGC-ATGAAGCGGCGTATAGCC 
CCCGCTCGAG-TTCCGAATATTTGGAACTTTT 


BamHI-Nhel 
Xhol 


ORF 137 


Forward 
Reverse 


CGCGGATCCCATATG-GGCACGGCGGGAAATA 
CCCGCTCGAG-ATAACGGTATGCCGCC 


BamHI-Ndel 
Xhol 


ORF 138 


Forward 
Reverse 


GCGGATCCCATATG-TTTCGTTTACAATTCAGGC 
CCCGCTCGAG-CGGCGTTTTATAGCGG 


BamHI-Ndel 
Xhol 


ORF 139 


Forward 
Reverse 


GCGGATCCCATATG-GCTTTTTTGGCGGTAATG 
CCCGCTCGAG-TAACGTTTCCGTGCGTTT 


BamHI-Ndel 
Xhol 
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ORF140 


Forward 
Reverse 


GCGGATCCCATATG-TTGCCCACAGGCAGC 
CCCGCTCGAG-GACGATGGCAAACAGC 


BamHI-Ndel 
Xhol 


ORF 141 


Forward 
Reverse 


GCGGATCCCATATG-CCGTCTGAAGCAGTCT 
CCCGCTCGAG-ATCTGTTGTTTTTAAAATATT 


BamHI-Ndel 
Xhol 


ORF142 


Forward 
Reverse 


GCGGATCCCATATG-GATAATTCTGGTAGTGAAG 
CCCGCTCGAG-AAACGTATAGCCTACCT 


BamHI-Ndel 
Xhol 


ORF 143 


Forward 
Reverse 


GCGGATCCCATATG-GATACCGCTTTGAACCT 
CCCGCTCGAG-AATGGCTTCCGCAATATG 


BamHI-Ndel 
Xhol ! 


ORF 144 


Forward 
Reverse 


GCGGATCCCATATG-ACCTTTTTACAACGTTTGC 
CCCGCTCGAG-AGATTGTTGTTGTTTTTTCG 


BamHI-Ndel 
Xhol 


ORF 147 


Forward 
Reverse 


GCGGATCCCATATG-TCTGTCTTTCAAACGGC 
CCCGCTCGAG-TTTGTTTTTGCAAGACAG 


BamHI-Ndel 
Xhol 



NB: 

- restriction sites are underlined 



- for ORFs 1 10-130, where the ORF itself carries an EcoRI site (eg. ORF122), a Sail site 
was used in the forward primer instead. Similarly, where the ORF carries a Pstl site (eg. 
5 ORFs 115 and 127), a Sail site was used in the reverse primer. 
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TABLE II - Summary of cloning, expression and purification 



ORF 


PCR/cloning 


His-fusion 
expression 


GST-fusion 
expression 


Purification 


orf 1 


+ 


+ 


+ 


His-fusion 


orf 2 


+ 


+ 


+ 


GST-fusion 


orf 2.1 


+ 


n.d. 


+ 


GST-fusion 


orf 4 


+ 


+ 


+ 


His-fusion 


orf 5 


+ 


n.d. 


+ 


GST-fusion 


orf 6 


+ 


+ 


+ 


GST-fusion 


orf 7 


+ 


+ 


+ 


GST-fusion 


orf 8 


+ 


n.d. 


n.d. 




orf 9 


+ 


+ 


+ 


GST-fusion 


orf 10 


+ 


n.d. 


n.d. 




orf 11 


+ 


n.d. 


n.d. 




orf 13 


+ 


n.d. 


+ 


GST-fusion 


orf 15 


+ 


+ 


+ 


GST-fusion 


orf 17 


+ 


n.d. 


n.d. 




orf 18 


+ 


n.d. [ 


n.d. 




orf 19 


+ 


n.d. 


n.d. 




orf 20 


+ 


n.d. 


n.d. 




orf 22 


+ 


+ 


+ 


GST-fusion 


orf 23 


+ 


+ 


+ 


His-fusion 


orf 24 


+ 


n.d. 


n.d. 




orf 25 


+ 


+ 


+ 


His-fusion 


orf 26 


+ 


n.d. 


n.d. 




orf 27 


+ 


+ 


+ 


GST-fusion 


orf 28 




+ 


+ 


GST-fusion 


orf 29 


+ 


n.d. 


n.d. 




orf 32 


+ 


+ 


+ 


His-fusion 


orf 33 


+ 


n.d. 


n.d. 




orf 35 


+ 


n.d. 


n.d. 




orf 37 


+ 


+ 


+ 


GST-fusion 


orf 58 


+ 


n.d. 


n.d. 




orf 65 


+ 


n.d. 


n.d. 




orf 66 


+ 


n.d. 


n.d. 




orf 72 


+ 


+ 


n.d. 


His-fusion 


orf 73 


+ 


n.d. 


+ 


n.d. 


orf 75 


+ 


n.d. 


n.d. 




orf 76 


+ 


+ 


n.d. 


His-fusion 


orf 79 


+ 


+ 


n.d. 


His-fusion 


orf 83 


+ 


n.d. 


+ 


n.d. 


orf 84 


+ 


n.d. 


n.d. 
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orf85 


+ 


n.d. 


+ 


GST-fiision 


orf89 


+ 


n.d. 


+ 


GST-fusion 


orf97 


+ 


+ 


+ 


GST-fusion 


orf98 


+ 


n.d. 


n.d. 




orf 100 


+ 


n.d. 


n.d. 




orflOl 




n.d. 


n.d. 




orf 102 


+ 


n.d. 


n.d. 




orf 103 


+ 


n.d. 


n.d. 




orf 104 


+ 


n.d. 


n.d. 




orf 105 


+ 


n.d. 


n.d. 




orf 106 


+ 


+ 


+ 


His-fusion 


orf 109 


+ 


n.d. 


n.d. 




orf 110 


+ 


n.d. 


n.d. 




orf 111 


+ 


+ 


n.d. 


His-fusion 


orf 113 


+ 


+ 


n.d. 


His-fusion 


orf 115 


n.d. 


n.d. 


n.d. 




orf 119 


+ 


+ 


n.d. 


His-fusion 


orf 120 


+ 


+ 


n.d. 


His-fusion 


orf 121 


+ 


n.d. 


n.d. 




orf 122 


+ 


+ 


n.d. 


His-fusion 


orf 125 


+ 


+ 


n.d. 


His-fusion 


orf 126 


+ 


+ 


n.d. 


His-fusion 


orf 127 


+ 


+ 


n.d. 


His-fusion 


orf 128 


+ 


n.d. 


n.d. 




orf 129 


+ 


+ 


n.d. 


His-fusion 


orf 130 


+ 


n.d. 


n.d. 




orf 131 


+ 


+ 


+ 


n.d. 


orf 132 


+ 


+ 


+ 


His-fusion 


orf 133 


+ 


n.d. 


+ 


GST-fusion 


orf 134 


+ 


n.d. 


n.d. 




orf 135 


+ 


n.d. 


n.d. 




orf 136 


+ 


n.d. 


n.d. 




orf 137 


+ 


n.d. 


+ 


GST-fusion 


orf 138 


+ 


n.d. 


+ 


GST-fusion 


orf 139 


+ 


n.d. 


n.d. 




orf 140 


+ 


n.d. 


n.d. 




orf 141 


+ 


n.d. 


n.d. 




orf 142 


+ 


n.d. 


n.d. 




orf 143 


j + 


n.d. 


n.d. 




orf 144 


+ 


n.d. 


+ 


n.d. 


orf 147 


+ 


n.d. 


n.d. 
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CLAIMS 

1. A protein comprising an amino acid sequence selected from the group consisting of SEQ 
IDs 2, 4, 6, and 8. 

2. A nucleic acid molecule which encodes a protein according to claim 1 . 

5 3. A nucleic acid molecule according to claim 2, comprising a nucleotide sequence selected 
from the group consisting of SEQ IDs 1, 3, 5, and 7. 

4. A protein comprising an amino acid sequence selected from the group consisting of SEQ 
IDs 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 
54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 

10 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 
144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182, 
184, 186, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 
224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262, 
264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 

15 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 336, 338, 340, 342, 
344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376, 378, 380, 382, 
384, 386, 388, 390, 392, 394, 396, 398, 400, 402, 404, 406, 408, 410, 412, 414, 416, 418, 420, 422, 
424, 426, 428, 430, 432, 434, 436, 438, 440, 442, 444, 446, 448, 450, 452, 454, 456, 458, 460, 462, 
464, 466, 468, 470, 472, 474, 476, 478, 480, 482, 484, 486, 488, 490, 492, 494, 496, 498, 500, 502, 

20 504, 506, 508, 510, 512, 514, 516, 518, 520, 522, 524, 526, 528, 530, 532, 534, 536, 538, 540, 542, 
544, 546, 548, 550, 552, 554, 556, 558, 560, 562, 564, 566, 568, 570, 572, 574, 576, 578, 580, 582, 
584, 586, 588, 590, 592, 594, 596, 598, 600, 602, 604, 606, 608, 610, 612, 614, 616, 618, 620, 622, 
624, 626, 628, 630, 632, 634, 636, 638, 640, 642, 644, 646, 648, 650, 652, 654, 656, 658, 660, 662, 
664, 666, 668, 670, 672, 674, 676, 678, 680, 682, 684, 686, 688, 690, 692, 694, 696, 698, 700, 702, 

25 704, 706, 708, 710, 712, 714, 716, 718, 720, 722, 724, 726, 728, 730, 732, 734, 736, 738, 740, 742, 
744, 746, 748, 750, 752, 754, 756, 758, 760, 762, 764, 766, 768, 770, 772, 774, 776, 778, 780, 782, 
784, 786, 788, 790, 792, 794, 796, 798, 800, 802, 804, 806, 808, 810, 812, 814, 816, 818, 820, 822, 
824, 826, 828, 830, 832, 834, 836, 838, 840, 842, 844, 846, 848, 850, 852, 854, 856, 858, 860, 862, 
864, 866, 868, 870, 872, 874, 876, 878, 880, 882, 884, 886, 888, 890, & 892.. 

30 5 . A protein having 50% or greater sequence identity to a protein according to claim 4. 
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6. A protein comprising a fragment of an amino acid sequence selected from the group 
consisting of SEQ IDs 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 
44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 
96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 

5 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 
176, 178, 180, 182, 184, 186, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 
216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 
256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 
296, 298, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 

1 0 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 
376, 378, 380, 382, 384, 386, 388, 390, 392, 394, 396, 398, 400, 402, 404, 406, 408, 410, 412, 414, 
416, 418, 420, 422, 424, 426, 428, 430, 432, 434, 436, 438, 440, 442, 444, 446, 448, 450, 452, 454, 
456, 458, 460, 462, 464, 466, 468, 470, 472, 474, 476, 478, 480, 482, 484, 486, 488, 490, 492, 494, 
496, 498, 500, 502, 504, 506, 508, 510, 512, 514, 516, 518, 520, 522, 524, 526, 528, 530, 532, 534, 

1 5 536, 538, 540, 542, 544, 546, 548, 550, 552, 554, 556, 558, 560, 562, 564, 566, 568, 570, 572, 574, 
576, 578, 580, 582, 584, 586, 588, 590, 592, 594, 596, 598, 600, 602, 604, 606, 608, 610, 612, 614, 
616, 618, 620, 622, 624, 626, 628, 630, 632, 634, 636, 638, 640, 642, 644, 646, 648, 650, 652, 654, 
656, 658, 660, 662, 664, 666, 668, 670, 672, 674, 676, 678, 680, 682, 684, 686, 688, 690, 692, 694, 
696, 698, 700, 702, 704, 706, 708, 710, 712, 714, 716, 718, 720, 722, 724, 726, 728, 730, 732, 734, 

20 736, 738, 740, 742, 744, 746, 748, 750, 752, 754, 756, 758, 760, 762, 764, 766, 768, 770, 772, 774, 
776, 778, 780, 782, 784, 786, 788, 790, 792, 794, 796, 798, 800, 802, 804, 806, 808, 810, 812, 814, 
816, 818, 820, 822, 824, 826, 828, 830, 832, 834, 836, 838, 840, 842, 844, 846, 848, 850, 852, 854, 
856, 858, 860, 862, 864, 866, 868, 870, 872, 874, 876, 878, 880, 882, 884, 886, 888, 890, & 892.. 

7. An antibody which binds to a protein according to any one of claims 4 to 6. 

25 8. A nucleic acid molecule which encodes a protein according to any one of claims 4 to 6. 

9. A nucleic acid molecule according to claim 8, comprising a nucleotide sequence selected 
from the group consisting of SEQ IDs 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 
37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 
89,91,93,95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 
30 131, 133, 135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169, 
171, 173, 175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195, 197, 199, 201, 203, 205, 207, 209, 
211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 
251, 253, 255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 
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291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 31 1, 313, 315, 317, 319, 321, 323, 325, 327, 329, 
331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 
371, 373, 375, 377, 379, 381, 383, 385, 387, 389, 391, 393, 395, 397, 399, 401, 403, 405, 407, 409, 
411, 413, 415, 417, 419, 421, 423, 425, 427, 429, 431, 433, 435, 437, 439, 441, 443, 445, 447, 449, 
5 451, 453, 455, 457, 459, 461, 463, 465, 467, 469, 471, 473, 475, 477, 479, 481, 483, 485, 487, 489, 
491, 493, 495, 497, 499, 501, 503, 505, 507, 509, 511, 513, 515, 517, 519, 521, 523, 525, 527, 529, 
531, 533, 535, 537, 539, 541, 543, 545, 547, 549, 551, 553, 555, 557, 559, 561, 563, 565, 567, 569, 
571, 573, 575, 577, 579, 581, 583, 585, 587, 589, 591, 593, 595, 597, 599, 601, 603, 605, 607, 609, 
611, 613, 615, 617, 619, 621, 623, 625, 627, 629, 631, 633, 635, 637, 639, 641, 643, 645, 647, 649, 

10 651, 653, 655, 657, 659, 661, 663, 665, 667, 669, 671, 673, 675, 677, 679, 681, 683, 685, 687, 689, 
691, 693, 695, 697, 699, 701, 703, 705, 707, 709, 711, 713, 715, 717, 719, 721, 723, 725, 727, 729, 
731, 733, 735, 737, 739, 741, 743, 745, 747, 749, 751, 753, 755, 757, 759, 761, 763, 765, 767, 769, 
771, 773, 775, 777, 779, 781, 783, 785, 787, 789, 791, 793, 795, 797, 799, 801, 803, 805, 807, 809, 
81 1, 813, 815, 817, 819, 821, 823, 825, 827, 829, 831, 833, 835, 837, 839, 841, 843, 845, 847, 849, 

15 851, 853, 855, 857, 859, 861, 863, 865, 867, 869, 871, 873, 875, 877, 879, 881, 883, 885, 887, 889, 
& 891.. 

10. A nucleic acid molecule comprising a fragment of a nucleotide sequence selected from the 
group consisting of SEQ IDs 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 
41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 

20 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 131, 133, 
135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 
175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195, 197, 199, 201,203,205,207,209,211,213, 
215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, 253, 
255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 

25 295, 297, 299, 301, 303, 305, 307, 309, 311, 313, 315, 317, 319, 321, 323, 325, 327, 329, 331, 333, 
335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 
375, 377, 379, 381, 383, 385, 387, 389, 391, 393, 395, 397, 399, 401, 403, 405, 407, 409, 41 1, 413, 
415, 417, 419, 421, 423, 425, 427, 429, 431, 433, 435, 437, 439, 441, 443, 445, 447, 449, 451, 453, 
455, 457, 459, 461, 463, 465, 467, 469, 471, 473, 475, 477, 479, 481, 483, 485, 487, 489, 491, 493, 

30 495, 497, 499, 501, 503, 505, 507, 509, 511, 513, 515, 517, 519, 521, 523, 525, 527, 529, 531, 533, 
535, 537, 539, 541, 543, 545, 547, 549, 551, 553, 555, 557, 559, 561, 563, 565, 567, 569, 571, 573, 
575, 577, 579, 581, 583, 585, 587, 589, 591, 593, 595, 597, 599, 601, 603, 605, 607, 609, 611, 613, 
615, 617, 619, 621, 623, 625, 627, 629, 631, 633, 635, 637, 639, 641, 643, 645, 647, 649, 651, 653, 
655, 657, 659, 661, 663, 665, 667, 669, 671, 673, 675, 677, 679, 681, 683, 685, 687, 689, 691, 693, 

35 695, 697, 699, 701, 703, 705, 707, 709, 71 1, 713, 715, 717, 719, 721, 723, 725, 727, 729, 731, 733, 
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735, 737, 739, 741, 743, 745, 747, 749, 751, 753, 755, 757, 759, 761, 763, 765, 767, 769, 771, 773, 
775, 777, 779, 781, 783, 785, 787, 789, 791, 793, 795, 797, 799, 801, 803, 805, 807, 809, 811, 813, 
815, 817, 819, 821, 823, 825, 827, 829, 831, 833, 835, 837, 839, 841, 843, 845, 847, 849, 851, 853, 
855, 857, 859, 861, 863, 865, 867, 869, 871, 873, 875, 877, 879, 881, 883, 885, 887, 889, & 891.. 

5 11. A nucleic acid molecule comprising a nucleotide sequence complementary to a nucleic acid 
molecule according to any one of claims 8 to 10. 

1 2. A nucleic acid molecule comprising a nucleotide sequences having 50% or greater sequence 
identity to a nucleic acid molecule according to any one of claims 8-11. 

13. A nucleic acid molecule which can hybridise to a nucleic acid molecule according to any 
1 0 one of claims 8-12 under high stringency conditions. 

14. A composition comprising a protein, a nucleic acid molecule, or an antibody according to 
any preceding claim. 

15. A composition according to claim 14 being a vaccine composition or a diagnostic 
composition. 

15 16. A composition according to claim 1 4 or claim 1 5 for use as a pharmaceutical. 

1 7. The use of a composition according to claim 14 in the manufacture of a medicament for the 
treatment or prevention of infection due to Neisserial bacteria. 
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